Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #4480: '[Bug]: Being blocked by cloudflare results in futile retries #4482

Merged
merged 2 commits into from
Oct 18, 2024

Conversation

neubig
Copy link
Contributor

@neubig neubig commented Oct 18, 2024

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Fixed unlimited retries when queries are blocked by CloudFlare.

Give a summary of what the PR does, explaining any non-trivial design decisions

This PR fixes the fact that OpenHands will perform unlimited retries when a query is blocked by CloudFlare.


Link of any specific issues this addresses

Fixes #4480

@neubig neubig marked this pull request as ready for review October 18, 2024 15:56
@neubig neubig requested a review from tofarr October 18, 2024 15:56
@tobitege
Copy link
Collaborator

@neubig maybe this might be worthwhile looking into: an idea on how to handle this with the help of the existing RetryMixin class and the retry_decorator decorator (used llama-3.1-nemotron-70b-instruct for this):

Q: How could the "retry_decorator" be enhanced to specify a list of exception types, that should NOT be retried, even if the base exception is part of one in the "retry_exceptions" list.
Example: see LLM_RETRY_EXCEPTIONS, where "APIError" is a generic exception that could contain text in the error message that would qualify to NOT be retried?

Enhancing retry_decorator to Exclude Specific Exception Types

To achieve this, we'll modify the retry_decorator to accept an additional parameter, exclude_exceptions, which will contain a list of exception types that should not be retried, even if their base exception is in retry_exceptions.

Updated Code

class RetryMixin:
   ...

    def retry_decorator(self, **kwargs):
       ...
        retry_exceptions = kwargs.get('retry_exceptions')
        exclude_exceptions = kwargs.get('exclude_exceptions', [])  # New parameter
       ...

        def _should_retry(exception):
            """Check if an exception should be retried."""
            return (
                isinstance(exception, tuple(retry_exceptions))
                and not isinstance(exception, tuple(exclude_exceptions))
            )

        return retry(
           ...
            retry=(retry_if_exception_type(_should_retry)),  # Update retry condition
           ...
        )

Explanation

  1. We added a new parameter exclude_exceptions to retry_decorator, defaulting to an empty list.
  2. We defined a new inner function _should_retry that takes an exception as input.
  3. _should_retry checks if the exception is an instance of any type in retry_exceptions but not an instance of any type in exclude_exceptions.
  4. We updated the retry decorator to use _should_retry as the condition for retrying exceptions.

Example Usage

LLM_RETRY_EXCEPTIONS = (
    APIConnectionError,
    APIError,  # Generic exception that might contain non-retryable errors
   ...
)

LLM_EXCLUDE_EXCEPTIONS = (  # New tuple for exceptions to exclude
    APIErrorIfContainsText,  # Custom exception that checks for specific text in APIError
)

...

@self.retry_decorator(
    num_retries=self.config.num_retries,
    retry_exceptions=LLM_RETRY_EXCEPTIONS,
    exclude_exceptions=LLM_EXCLUDE_EXCEPTIONS,  # Pass the exclude exceptions
   ...
)
def wrapper(*args, **kwargs):
   ...

Custom Exception Example (APIErrorIfContainsText)

class APIErrorIfContainsText(APIError):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.text_to_check = "specific error text that should not be retried"

    def __instancecheck__(self, instance):
        return (
            super().__instancecheck__(instance)
            and self.text_to_check in instance.response.text
        )

In this example, APIErrorIfContainsText is a custom exception that checks if the APIError response text contains a specific string. If it does, the exception will be excluded from retries.

@neubig
Copy link
Contributor Author

neubig commented Oct 18, 2024

Thanks @tobitege! That makes sense, but do you think it should be done in this PR or saved for a future one? My sense is that the current fix is reasonably good, and we can always improve it later.

@tobitege
Copy link
Collaborator

Thanks @tobitege! That makes sense, but do you think it should be done in this PR or saved for a future one? My sense is that the current fix is reasonably good, and we can always improve it later.

Sure thing, let me move the idea into a new issue for potential tracking.

Copy link
Collaborator

@tofarr tofarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Toby's suggestion may be worth adding a second PR for.

@neubig neubig merged commit b660aa9 into main Oct 18, 2024
@neubig neubig deleted the openhands-fix-issue-4480 branch October 18, 2024 17:04
@enyst
Copy link
Collaborator

enyst commented Oct 18, 2024

Argh, this was going to happen. We have re-introduced retries on APIError here:

  • it became necessary quickly because of litellm proxy on All Hands, which raises API Error as a catch-all for issues, like 502, that can be temporary
  • but retrying on a litellm catch-all, which includes permanent exceptions, is a bad idea, since well, it retries on permanent exceptions.

Other things we can consider IMHO:

  • revert that commit
  • talk to the litellm project to treat 502 separately from the catch-all. I meant to do this and it slipped from my todo list, sorry
  • fix or address it in the proxy. IIRC we were trying to debug and eval the CoAct PR, and it was a bit difficult to use then, but I don't know, IME it has become practically impossible to use the proxy in the last - I forget exactly - for weeks, anyway.

@neubig

@neubig
Copy link
Contributor Author

neubig commented Oct 19, 2024

I think

talk to the litellm project to treat 502 separately from the catch-all

Send like a good first step!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Being blocked by cloudflare results in futile retries
5 participants