Agent Failure Modes - Albert Masoliver's learning site

## Definition **Agent failure modes** are the distinct ways an agent can go wrong, enumerated as a checklist. To evaluate an agent you do not ask "is it good?" — you ask "how often does it fail, and in which of these specific ways?" Huyen's framing in *[[AI Engineering - Chip Huyen]]* is that you make agents measurable by naming their failure modes and counting each. ## The taxonomy | Mode | What happens | Example | |------|--------------|---------| | Tool-use failure | Wrong tool, bad params, or wrong argument values | calls `delete` instead of `archive`; passes a malformed date | | Goal failure | Completes steps but violates a constraint | finishes the task but blows the time/budget limit | | Tool-output failure | Right tool, right call, *wrong result* | the search API returns stale or empty data | | Reflection failure | Declares success when it isn't done | a false "done" that stops the loop early | ### Tool-use failures The model picks an invalid tool, supplies parameters that don't match the schema, or supplies plausible-but-wrong values. This is the most directly preventable class — schema validation and good tool descriptions (see [[Function Calling]]) catch much of it before execution. ### Goal failures The agent honours the *letter* of the task but misses a constraint — a deadline, a budget cap, a "don't touch production" rule. These are insidious because every individual step looks fine; only the aggregate violates intent. Encode constraints explicitly and check them, don't hope the model remembers. ### Tool-output failures Here the agent did everything right and the *world* lied — a flaky API, a stale index, a tool that silently truncates. The agent isn't at fault but the trajectory still derails, which is why you measure tool reliability separately from agent reliability. ### Reflection failures The agent's self-assessment is wrong: it believes it finished, or believes a wrong answer is right, and exits the [[Agentic Loop]]. Because [[Reflection]] is itself a model judgement, it inherits the model's blind spots — a strong argument for [[Verifier Independence]]. ## The boring defence: log everything The most effective debugging tool is unglamorous — **print every tool call and every tool output.** A full trace of name, arguments, raw result, and the agent's next decision lets you classify each failure into the table above and pinpoint where the trajectory broke. Without it you are guessing; with it, evaluation becomes counting. ## Why this matters Failure modes are the diagnostic complement to [[Compounding Error]]: the multiplication law says long chains are fragile, and this taxonomy tells you *which* step type is bleeding reliability. Note too that some "failures" are adversarially induced — a retrieved document carrying a [[Prompt Injection]] manifests as a tool-output failure that cascades into a goal failure, so security and reliability analysis overlap. ## Related - [[Compounding Error]] - [[Tool Use]] - [[Reflection]] - [[Prompt Injection]] - [[Agentic Loop]] - [[Verifier Independence]] - [[Function Calling]] - [[AI Engineering - Chip Huyen]]