Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LMs: retry with exponential backoff for a limited set of error codes #1753

Merged
merged 6 commits into from
Nov 5, 2024

Conversation

dbczumar
Copy link
Collaborator

@dbczumar dbczumar commented Nov 5, 2024

LMs: retry with exponential backoff for a limited set of error codes

Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
@@ -32,7 +33,7 @@ def __init__(
cache: bool = True,
launch_kwargs: Optional[Dict[str, Any]] = None,
callbacks: Optional[List[BaseCallback]] = None,
num_retries: int = 3,
Copy link
Collaborator Author

@dbczumar dbczumar Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 retries (equivalent to < 5 seconds) is insufficient to overcome rate limiting in many production environments with high traffic

@@ -32,7 +33,7 @@ def __init__(
cache: bool = True,
launch_kwargs: Optional[Dict[str, Any]] = None,
callbacks: Optional[List[BaseCallback]] = None,
num_retries: int = 3,
num_retries: int = 8,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empirically, 8 retries translates to ~ 1 minute of wall clock time, which should be sufficient to overcome rate limiting in most cases

Signed-off-by: dbczumar <[email protected]>
Comment on lines +199 to +224
retry_policy = RetryPolicy(
TimeoutErrorRetries=num_retries,
RateLimitErrorRetries=num_retries,
InternalServerErrorRetries=num_retries,
# We don't retry on errors that are unlikely to be transient
# (e.g. bad request, invalid auth credentials)
BadRequestErrorRetries=0,
AuthenticationErrorRetries=0,
ContentPolicyViolationErrorRetries=0,
)

return Router(
# LiteLLM routers must specify a `model_list`, which maps model names passed
# to `completions()` into actual LiteLLM model names. For our purposes, the
# model name is the same as the LiteLLM model name, so we add a single
# entry to the `model_list` that maps the model name to itself
model_list=[
{
"model_name": model,
"litellm_params": {
"model": model,
},
}
],
retry_policy=retry_policy,
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LiteLLM Routers appear to be the only mechanism allowing exponential backoff and configurable retry codes. Docs: https://docs.litellm.ai/docs/routing

@dbczumar dbczumar requested a review from okhat November 5, 2024 05:03
@okhat okhat merged commit cadd619 into stanfordnlp:main Nov 5, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants