Skip to content
Agentic AI5 min read0 views

AI Agent Reliability Patterns: Retries, Fallbacks, and Circuit Breakers for Production Agents

How to build reliable AI agents using battle-tested distributed systems patterns: retry strategies, fallback chains, circuit breakers, and graceful degradation.

Agents Fail. The Question Is How Gracefully.

AI agents in production face a constant stream of failures: API rate limits, tool execution errors, malformed LLM outputs, timeout on external services, and model hallucinations that derail multi-step plans. The difference between a demo agent and a production agent is not capability -- it is reliability engineering.

The good news is that decades of distributed systems engineering have produced patterns that apply directly to agent systems.

Pattern 1: Structured Retries

Not all failures are equal. Your retry strategy should match the failure type:

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type

@retry(
    retry=retry_if_exception_type((RateLimitError, TimeoutError)),
    wait=wait_exponential(multiplier=1, min=1, max=60),
    stop=stop_after_attempt(5),
    before_sleep=log_retry_attempt
)
async def call_llm(messages, tools):
    return await client.messages.create(
        model="claude-sonnet-4-20250514",
        messages=messages,
        tools=tools
    )

Key principles:

  • Exponential backoff: Prevents thundering herd on rate limits
  • Jitter: Add random jitter to prevent synchronized retries from multiple agents
  • Selective retry: Only retry transient errors (rate limits, timeouts). Do not retry on invalid requests or authentication failures
  • Maximum attempts: Always cap retries to prevent infinite loops

Pattern 2: Model Fallback Chains

When your primary model is unavailable or degraded, fall back to alternatives:

MODEL_CHAIN = [
    {"model": "claude-sonnet-4-20250514", "provider": "anthropic"},
    {"model": "gpt-4o", "provider": "openai"},
    {"model": "claude-haiku-4-20250514", "provider": "anthropic"},  # Cheaper, faster, less capable
]

async def resilient_llm_call(messages, tools):
    for model_config in MODEL_CHAIN:
        try:
            return await call_provider(
                model=model_config["model"],
                provider=model_config["provider"],
                messages=messages,
                tools=tools
            )
        except (ServiceUnavailableError, RateLimitError) as e:
            logger.warning(f"Fallback from {model_config['model']}: {e}")
            continue
    raise AllModelsUnavailableError("Exhausted all model fallbacks")

Important considerations:

  • Prompts may need adjustment for different models (tool schemas, system prompt format)
  • Track which model actually served each request for quality monitoring
  • Quality may degrade with fallback models -- alert when the primary model has been unavailable for extended periods

Pattern 3: Circuit Breakers

Prevent cascading failures by stopping calls to a failing service:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=60):
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.state = "CLOSED"  # CLOSED = normal, OPEN = blocking, HALF_OPEN = testing
        self.last_failure_time = None

    async def call(self, func, *args, **kwargs):
        if self.state == "OPEN":
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = "HALF_OPEN"
            else:
                raise CircuitOpenError("Circuit breaker is open")

        try:
            result = await func(*args, **kwargs)
            if self.state == "HALF_OPEN":
                self.state = "CLOSED"
                self.failure_count = 0
            return result
        except Exception as e:
            self.failure_count += 1
            self.last_failure_time = time.time()
            if self.failure_count >= self.failure_threshold:
                self.state = "OPEN"
            raise

Use separate circuit breakers for each external dependency (LLM provider, tool APIs, databases).

Pattern 4: Idempotent Tool Execution

Agent tools must be safe to retry. If a tool call times out, the agent (or retry logic) may call it again. Non-idempotent tools can cause double-charges, duplicate records, or other side effects.

Design principles:

  • Use idempotency keys for operations that create or modify resources
  • Make read operations naturally idempotent
  • Log tool execution results and check for existing results before re-executing
  • Use database transactions with unique constraints to prevent duplicates

Pattern 5: Graceful Degradation

When full functionality is unavailable, provide reduced but useful service:

  • Tool failure: If a search tool fails, the agent can still answer from its parametric knowledge (with appropriate caveats)
  • Context retrieval failure: If RAG retrieval fails, fall back to a general response with a disclaimer
  • Timeout: If the agent cannot complete a complex task within the time budget, return partial results with an explanation

Pattern 6: Checkpointing for Long-Running Agents

Agents that run for minutes or hours should checkpoint their state:

class CheckpointedAgent:
    async def run(self, task):
        checkpoint = await self.load_checkpoint(task.id)

        for step in self.plan(task, resume_from=checkpoint):
            result = await self.execute_step(step)
            await self.save_checkpoint(task.id, step, result)

            if result.failed and not result.retryable:
                return self.partial_result(task.id)

        return self.final_result(task.id)

If the agent crashes or the process restarts, it resumes from the last checkpoint instead of starting over.

Measuring Reliability

Track these metrics to quantify agent reliability:

  • Task completion rate: Percentage of tasks completed successfully
  • Mean time to completion: Average wall-clock time per task
  • Retry rate: How often retries are needed (high rates indicate systemic issues)
  • Fallback rate: How often the primary model/tool is unavailable
  • Error categorization: Breakdown of failures by type (rate limit, timeout, parsing, tool error)

Sources: Microsoft Release It! Patterns | Anthropic Agent Reliability | AWS Well-Architected Framework

Share this article
N

NYC News

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.