Hierarchical Retrieval - Albert Masoliver's learning site

## Definition **Hierarchical retrieval** is a multi-stage strategy for loading information into context: cheap scouts surface candidates, mid-tier models skim and rank, and only the top-ranked items get deep reading by the reasoning model. Counters [[Lost in the Middle Effect]] and reduces token cost. ## The Pipeline 1. **Scout (Haiku).** Reads memory titles, file names, or short descriptions. Returns ~10 candidate IDs. 2. **Skim (Sonnet).** Pulls one-line summaries for those candidates. 3. **Reason (Sonnet, think hard / Opus).** Loads full text of the top 3 only and answers the question. ## Why It Beats Blanket Loading - A model with 30k tokens of relevant content outperforms one with 30k of relevant + 100k of "just in case". - Each stage discards noise before it reaches the next, so the reasoning stage sees only signal. - The pipeline pays for itself even in raw cost terms — Haiku is ~15× cheaper per token than Opus. ## Worked Question > *"Where do we depend on the old `created_at_unix` field?"* - Blanket: load all of `src/`, hope. Hallucinated paths likely. - Hierarchical: Haiku greps for the pattern, returns 12 paths; Sonnet skims and keeps 4; Sonnet+think-hard reads those 4 and produces the answer. ## When It Underperforms - The question is *narrow and well-localised* — overhead exceeds the saving. - File names are ambiguous (`utils.ts`, `helpers.py`) — the scout step can't discriminate. ## Composing It The pipeline becomes a [[Specialized Agent]] you invoke once, instead of three sub-runs you orchestrate. ## Related - [[Context vs Memory]] - [[Lost in the Middle Effect]] - [[Model Selection Strategy]] - [[Effective Context Engineering for AI Agents]]