Compounding Error - Albert Masoliver's learning site

## Definition **Compounding error** is the defining risk of multi-step agents: because the agent's steps are chained, per-step reliability *multiplies*, so a model that is individually quite accurate becomes unreliable over a long trajectory. It is the mathematical reason that long agentic chains are hard. ## The arithmetic If each step succeeds independently with probability *p*, a chain of *n* steps succeeds with probability *pⁿ*. Even a strong per-step accuracy decays brutally: | Per-step accuracy | 10 steps | 100 steps | |-------------------|----------|-----------| | 95% | ~60% | ~0.6% | | 99% | ~90% | ~37% | Huyen uses exactly the 95% example in *[[AI Engineering - Chip Huyen]]*: a 95%-reliable step yields only about 60% success over ten steps and a catastrophic ~0.6% over a hundred. The lesson is stark — small per-step gaps that look like rounding noise become the dominant failure once the agent loops many times. ## Why agents are especially exposed A single prompt is one step; an agent is the [[Agentic Loop]] run many times. Every iteration adds a multiplicand. Worse, errors are not always independent — a wrong observation early on poisons every downstream decision, so real agents often do *worse* than the independent-multiplication model predicts. And the steps are not just reasoning: they include [[Tool Use]], where a single bad write action can be irreversible. **Tool access raises the stakes of each error** from "wrong sentence" to "wrong transaction." ## What it forces in design Compounding error is the budget you are spending; everything below is how to spend less of it: - **Stronger models / fewer steps.** Push *p* toward 1 and shrink *n*. Shorter plans beat longer ones; collapse steps where you can. - **[[Reflection]].** Insert checkpoints where the agent critiques its own progress and catches drift before it propagates. - **Verification.** Check intermediate results — and prefer [[Verifier Independence]], because a verifier that shares the actor's blind spots multiplies the error instead of catching it. - **Bounded horizons.** Cap the number of steps and fail loudly rather than wandering. ## Relation to failure analysis Compounding error explains *why* agents fail at scale; cataloguing *how* they fail is the job of [[Agent Failure Modes]]. The two are complementary: the multiplication law tells you long chains are fragile, and the failure-mode taxonomy tells you which step types are eating your reliability budget so you know where to intervene. ## Related - [[Agentic Loop]] - [[Agent Failure Modes]] - [[Verifier Independence]] - [[Reflection]] - [[Tool Use]] - [[AI Engineering - Chip Huyen]]