-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LMs: retry with exponential backoff for a limited set of error codes #1753
Conversation
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
@@ -32,7 +33,7 @@ def __init__( | |||
cache: bool = True, | |||
launch_kwargs: Optional[Dict[str, Any]] = None, | |||
callbacks: Optional[List[BaseCallback]] = None, | |||
num_retries: int = 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 retries (equivalent to < 5 seconds) is insufficient to overcome rate limiting in many production environments with high traffic
@@ -32,7 +33,7 @@ def __init__( | |||
cache: bool = True, | |||
launch_kwargs: Optional[Dict[str, Any]] = None, | |||
callbacks: Optional[List[BaseCallback]] = None, | |||
num_retries: int = 3, | |||
num_retries: int = 8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empirically, 8 retries translates to ~ 1 minute of wall clock time, which should be sufficient to overcome rate limiting in most cases
Signed-off-by: dbczumar <[email protected]>
retry_policy = RetryPolicy( | ||
TimeoutErrorRetries=num_retries, | ||
RateLimitErrorRetries=num_retries, | ||
InternalServerErrorRetries=num_retries, | ||
# We don't retry on errors that are unlikely to be transient | ||
# (e.g. bad request, invalid auth credentials) | ||
BadRequestErrorRetries=0, | ||
AuthenticationErrorRetries=0, | ||
ContentPolicyViolationErrorRetries=0, | ||
) | ||
|
||
return Router( | ||
# LiteLLM routers must specify a `model_list`, which maps model names passed | ||
# to `completions()` into actual LiteLLM model names. For our purposes, the | ||
# model name is the same as the LiteLLM model name, so we add a single | ||
# entry to the `model_list` that maps the model name to itself | ||
model_list=[ | ||
{ | ||
"model_name": model, | ||
"litellm_params": { | ||
"model": model, | ||
}, | ||
} | ||
], | ||
retry_policy=retry_policy, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LiteLLM Routers appear to be the only mechanism allowing exponential backoff and configurable retry codes. Docs: https://docs.litellm.ai/docs/routing
LMs: retry with exponential backoff for a limited set of error codes