Different retry logic for Validation error vs Rate limits #1444

samgregson · 2025-03-26T16:45:45Z

samgregson
Mar 26, 2025

For validation errors I would like:

Instant retry
max_retries of 2

Whereas for openai.RateLimitError I would like:

exponential backoff
max_reties of say 10

How do I make this happen?

I originally had this:

@retry(stop=stop, wait=wait, retry=retry_type)
async def generate_chat_completion(
    prompt: str, model: str, temperature: float = 0, response_model=None, max_retries=2
) -> str:
    response: ChatCompletion = await client.chat.completions.create(
        model=model,
        temperature=temperature,
        messages=[{"role": "user", "content": prompt}],
        response_model=response_model,
        max_retries=max_retries
    )
    return response

However I realise that this doesn't work as if there is a validation error and then a retry error it'll wipe the validation errors as if starting again from scratch. I don't want to keep retrying for validation errors as it will likely not recover, but for rate limits I am happy to wait until I can retry successfully.

wesleysmyth · 2026-02-18T03:05:55Z

wesleysmyth
Feb 18, 2026

Instructor's built-in max_retries handles validation errors by sending the error back to the model. For rate limits, you want exponential backoff at the HTTP level. Here is how to separate them:

Use tenacity for rate limits, instructor for validation

import instructor
from openai import OpenAI, RateLimitError
from tenacity import retry, retry_if_exception_type, wait_exponential, stop_after_attempt
from pydantic import BaseModel

client = instructor.from_openai(OpenAI())

class UserProfile(BaseModel):
    name: str
    age: int
    email: str

@retry(
    retry=retry_if_exception_type(RateLimitError),
    wait=wait_exponential(multiplier=1, min=4, max=60),
    stop=stop_after_attempt(10),
)
def extract_profile(text: str) -> UserProfile:
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Extract profile: {text}"}],
        response_model=UserProfile,
        max_retries=2,  # Only for validation errors
    )

How this works:

Validation error (model returns bad JSON/fails Pydantic): instructor retries up to 2 times, sending the error back to the model to fix
Rate limit (429 from OpenAI): tenacity catches RateLimitError, waits with exponential backoff, retries up to 10 times
Other errors (network, 500s): bubble up immediately

More granular control with a custom retry callback

from tenacity import before_sleep_log
import logging

logger = logging.getLogger(__name__)

@retry(
    retry=retry_if_exception_type((RateLimitError, ConnectionError)),
    wait=wait_exponential(multiplier=1, min=2, max=120),
    stop=stop_after_attempt(8),
    before_sleep=before_sleep_log(logger, logging.WARNING),
)
def robust_extract(text: str) -> UserProfile:
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Extract: {text}"}],
        response_model=UserProfile,
        max_retries=3,
    )

For Anthropic

from anthropic import RateLimitError as AnthropicRateLimit

@retry(
    retry=retry_if_exception_type(AnthropicRateLimit),
    wait=wait_exponential(multiplier=2, min=5, max=120),
    stop=stop_after_attempt(6),
)
def extract_with_claude(text: str) -> UserProfile:
    return client.chat.completions.create(
        model="claude-sonnet-4-20250514",
        messages=[{"role": "user", "content": text}],
        response_model=UserProfile,
        max_retries=2,
    )

The key principle: tenacity wraps the outside (transport-level retries), instructor handles the inside (LLM-level retries with feedback).

0 replies

xXMrNidaXx · 2026-02-23T13:41:41Z

xXMrNidaXx
Feb 23, 2026

Different retry logic for validation vs rate limits is crucial! At RevolutionAI (https://revolutionai.io) we handle these very differently.

Our approach:

from tenacity import retry, retry_if_exception_type, wait_exponential, stop_after_attempt
from instructor import InstructorRetryException
from openai import RateLimitError

# Validation errors: retry with better prompting
@retry(
    retry=retry_if_exception_type(InstructorRetryException),
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=0.5, max=2)
)
async def extract_with_validation_retry(...):
    pass

# Rate limits: exponential backoff, more attempts
@retry(
    retry=retry_if_exception_type(RateLimitError),
    stop=stop_after_attempt(10),
    wait=wait_exponential(multiplier=2, max=60)
)
async def extract_with_rate_limit_retry(...):
    pass

Key differences:

Validation: Few retries, add context to prompt
Rate limits: Many retries, exponential backoff
Network errors: Quick retry with jitter

Would be great to have this built into instructor directly with configurable strategies!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Different retry logic for Validation error vs Rate limits #1444

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Different retry logic for Validation error vs Rate limits #1444

Uh oh!

Uh oh!

samgregson Mar 26, 2025

Replies: 2 comments

Uh oh!

wesleysmyth Feb 18, 2026

Use tenacity for rate limits, instructor for validation

More granular control with a custom retry callback

For Anthropic

Uh oh!

xXMrNidaXx Feb 23, 2026

samgregson
Mar 26, 2025

wesleysmyth
Feb 18, 2026

xXMrNidaXx
Feb 23, 2026