LMs: retry with exponential backoff for a limited set of error codes #1753

dbczumar · 2024-11-05T05:00:00Z

LMs: retry with exponential backoff for a limited set of error codes

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-11-05T05:00:41Z

dspy/clients/lm.py

@@ -32,7 +33,7 @@ def __init__(
        cache: bool = True,
        launch_kwargs: Optional[Dict[str, Any]] = None,
        callbacks: Optional[List[BaseCallback]] = None,
-        num_retries: int = 3,


3 retries (equivalent to < 5 seconds) is insufficient to overcome rate limiting in many production environments with high traffic

dbczumar · 2024-11-05T05:01:16Z

dspy/clients/lm.py

@@ -32,7 +33,7 @@ def __init__(
        cache: bool = True,
        launch_kwargs: Optional[Dict[str, Any]] = None,
        callbacks: Optional[List[BaseCallback]] = None,
-        num_retries: int = 3,
+        num_retries: int = 8,


Empirically, 8 retries translates to ~ 1 minute of wall clock time, which should be sufficient to overcome rate limiting in most cases

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-11-05T05:02:57Z

dspy/clients/lm.py

+    retry_policy = RetryPolicy(
+        TimeoutErrorRetries=num_retries,
+        RateLimitErrorRetries=num_retries,
+        InternalServerErrorRetries=num_retries,
+        # We don't retry on errors that are unlikely to be transient
+        # (e.g. bad request, invalid auth credentials)
+        BadRequestErrorRetries=0,
+        AuthenticationErrorRetries=0,
+        ContentPolicyViolationErrorRetries=0,
+    )
+
+    return Router(
+        # LiteLLM routers must specify a `model_list`, which maps model names passed
+        # to `completions()` into actual LiteLLM model names. For our purposes, the
+        # model name is the same as the LiteLLM model name, so we add a single
+        # entry to the `model_list` that maps the model name to itself
+        model_list=[
+            {
+                "model_name": model,
+                "litellm_params": {
+                    "model": model,
+                },
+            }
+        ],
+        retry_policy=retry_policy,
+    )


LiteLLM Routers appear to be the only mechanism allowing exponential backoff and configurable retry codes. Docs: https://docs.litellm.ai/docs/routing

dbczumar added 5 commits October 30, 2024 16:26

fix

355a26f

Signed-off-by: dbczumar <[email protected]>

fix

ed18540

Signed-off-by: dbczumar <[email protected]>

progress

ce046d8

Signed-off-by: dbczumar <[email protected]>

fix

f42303a

Signed-off-by: dbczumar <[email protected]>

fix

c0de9c8

Signed-off-by: dbczumar <[email protected]>

dbczumar commented Nov 5, 2024

View reviewed changes

fix

23fc5e5

Signed-off-by: dbczumar <[email protected]>

dbczumar commented Nov 5, 2024

View reviewed changes

dbczumar requested a review from okhat November 5, 2024 05:03

okhat merged commit cadd619 into stanfordnlp:main Nov 5, 2024
4 checks passed

dbczumar mentioned this pull request Nov 6, 2024

Add test coverage for caching against a LiteLLM test server #1769

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LMs: retry with exponential backoff for a limited set of error codes #1753

LMs: retry with exponential backoff for a limited set of error codes #1753

dbczumar commented Nov 5, 2024

dbczumar Nov 5, 2024 •

edited

Loading

dbczumar Nov 5, 2024

dbczumar Nov 5, 2024

LMs: retry with exponential backoff for a limited set of error codes #1753

LMs: retry with exponential backoff for a limited set of error codes #1753

Conversation

dbczumar commented Nov 5, 2024

dbczumar Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

dbczumar Nov 5, 2024

Choose a reason for hiding this comment

dbczumar Nov 5, 2024

Choose a reason for hiding this comment

dbczumar Nov 5, 2024 •

edited

Loading