## Definition
**Hierarchical retrieval** is a multi-stage strategy for loading information into context: cheap scouts surface candidates, mid-tier models skim and rank, and only the top-ranked items get deep reading by the reasoning model. Counters [[Lost in the Middle Effect]] and reduces token cost.
## The Pipeline
1. **Scout (Haiku).** Reads memory titles, file names, or short descriptions. Returns ~10 candidate IDs.
2. **Skim (Sonnet).** Pulls one-line summaries for those candidates.
3. **Reason (Sonnet, think hard / Opus).** Loads full text of the top 3 only and answers the question.
## Why It Beats Blanket Loading
- A model with 30k tokens of relevant content outperforms one with 30k of relevant + 100k of "just in case".
- Each stage discards noise before it reaches the next, so the reasoning stage sees only signal.
- The pipeline pays for itself even in raw cost terms — Haiku is ~15× cheaper per token than Opus.
## Worked Question
> *"Where do we depend on the old `created_at_unix` field?"*
- Blanket: load all of `src/`, hope. Hallucinated paths likely.
- Hierarchical: Haiku greps for the pattern, returns 12 paths; Sonnet skims and keeps 4; Sonnet+think-hard reads those 4 and produces the answer.
## When It Underperforms
- The question is *narrow and well-localised* — overhead exceeds the saving.
- File names are ambiguous (`utils.ts`, `helpers.py`) — the scout step can't discriminate.
## Composing It
The pipeline becomes a [[Specialized Agent]] you invoke once, instead of three sub-runs you orchestrate.
## Related
- [[Context vs Memory]]
- [[Lost in the Middle Effect]]
- [[Model Selection Strategy]]
- [[Effective Context Engineering for AI Agents]]