AI Agent Failure Modes: 6 Categories, 24 Checks, All Catchable Before You Write Code
Most AI agent failures are not model failures -- they are architecture failures that were detectable at the design stage. Tool loops, permission overreach, silent error swallowing, and scope bleed are structural problems that a 15-minute pre-build checklist catches. Learning them in production costs days; catching them in design costs nothing.
Get the full 24-check pre-build audit -- $17Failure Mode 1: Tool Loop / Infinite Retry
What it looks like: the agent calls the same tool repeatedly (often because the tool returns an error or an unexpected format) and never exits. In production this burns API tokens and blocks other tasks. In worst cases it triggers real-world side effects -- sending duplicate emails, creating duplicate records, or triggering multiple payment attempts.
Detection at design time:
- Does your design specify a maximum tool-call count per session? If not, there is no exit.
- Is the max call limit enforced in code (not just in the prompt)? Prompts can be overridden; code cannot.
- What does the agent do when a tool returns an error three times in a row? If the answer is "retry again," you have a loop.
Fix: Add a hard code-level limit (e.g. max 10 tool calls per session). On the Nth failure, return a structured error to the caller rather than retrying. Log the failure with full context so a human can diagnose.
Failure Mode 2: Silent Error Swallowing
What it looks like: the agent encounters an error (tool call fails, API returns 500, output does not match expected schema) and returns a success message to the caller anyway. The caller believes the task is done; the task is not done and there is no record of the failure.
Why this happens: LLMs are trained to be helpful. When a tool fails, the model often generates a plausible-sounding response rather than admitting failure -- especially if the system prompt does not explicitly require failure reporting.
Detection at design time:
- Does the system prompt explicitly instruct the agent to return a structured error format when a tool call fails, rather than proceeding with the task?
- Is there a distinction in the output schema between "task_completed" and "task_completed_successfully"?
- Is there a logging layer that captures every tool call response, not just the final agent output?
Fix: Add an explicit failure reporting requirement to the system prompt. Add a logging wrapper around every tool call that captures input, output, and status regardless of the final agent output.
Failure Mode 3: Permission Overreach
What it looks like: the agent is granted access to tools or resources that its task does not require. When a prompt injection or unexpected edge case occurs, the agent takes actions outside its intended scope using the excess permissions it was granted.
Real example: an email-drafting agent that is given full inbox read/write access (convenient during development) can, when given a crafted input, read emails it was not supposed to see or send emails it was not supposed to send. A read-only credential for the drafting function would have contained the blast radius.
Detection at design time:
- List every tool and permission the agent has. For each one, ask: does the core task require this, or was it added for convenience?
- For file system access: is the agent scoped to a specific directory, or does it have broader filesystem access?
- For database access: read-only vs read-write? Scoped to specific tables?
Fix: Principle of least privilege. Grant the minimum permission required for the specific task. Accept the inconvenience of more restricted credentials -- it contains failures to the task scope.
What the Full 24-Check Audit Covers
These three failure modes are 3 of the 6 categories in the full 24-check pre-build audit. The other three categories:
- Scope bleed: checks that catch agents taking actions outside their defined domain, especially when new tools are added mid-build
- Memory mismanagement: checks for context window overflow behavior, summarization quality loss, and state persistence between sessions
- Human handoff gaps: checks for defined escalation triggers -- when does the agent stop and ask a human, and is that trigger tested?
The full audit has 4 checks per category (24 total), each formatted as a binary pass/fail question with a one-line fix guidance for failures. Designed to run in 15-30 minutes against a design document, before any code is written.
FAQ
Do these failure modes apply to simple single-tool agents, or only to complex multi-agent systems?
Tool loops and silent errors occur in even the simplest single-tool agents. Permission overreach is a risk at any complexity level. Scope bleed and memory issues become more severe in multi-agent systems, but the detection checks are still worth running on single agents.
I already have an agent running in production. Is it too late to run this audit?
No. The permission and tool-scope checks are worth running on live agents -- overreach in production is an active security risk. The loop and silence checks help you identify whether your production agent has latent failure modes that have not yet triggered.
What is the format of the full 24-check audit?
A structured document with 24 pass/fail questions, a severity rating (P0/P1/P2) for each, a one-line fix guidance, and a notes column for context-specific decisions. Designed to be printed and completed in a single sitting before a build starts.