Contextual Retrieval - Albert Masoliver's learning site

## Definition **Contextual retrieval** is a [[RAG Chunking Strategy|chunking augmentation]] technique that prepends a short, AI-generated context to each document chunk before indexing — so that the chunk carries enough information to be retrieved even when its meaning depends on the surrounding document. Introduced by Anthropic (2024). ## The Problem It Solves When a document is split into chunks, individual chunks often lose their context. A chunk that says "the second stage requires a temperature above 400°C" is retrievable by keyword, but a retriever cannot know which process or experiment this refers to without the surrounding document. After embedding, this ambiguity becomes even harder to resolve. ## The Technique Before indexing, each chunk is augmented with a short description — typically 50–100 tokens — generated by an LLM using the whole document and the chunk as input: ``` <document> {{WHOLE_DOCUMENT}} </document> Here is the chunk we want to situate within the whole document: <chunk> {{CHUNK_CONTENT}} </chunk> Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else. ``` The generated context is prepended to the chunk. The augmented chunk is then indexed by the retrieval algorithm (term-based, embedding-based, or both). ## Example Raw chunk: `"the gross margin improved by 3 points year-over-year"` With contextual prefix: `"This chunk is from Acme Corp's Q3 2024 earnings report. It describes profitability improvements versus Q3 2023. The gross margin improved by 3 points year-over-year."` The augmented chunk now retrieves correctly for queries about Acme's profitability, Q3 results, or year-over-year comparisons — none of which appear in the raw chunk. ## Complementary Augmentation Tactics Beyond AI-generated context, other metadata can be appended to chunks: - **Extracted entities** — product codes, error codes, proper nouns that should be keyword-searchable even after embedding. - **Answerable questions** — for customer support corpora, each article can be augmented with questions it answers ("How do I reset my password?", "I can't log in"). This aligns the index with how users actually query. - **Document-level metadata** — title, author, date, tags for filtering at retrieval time. ## Cost Considerations Contextual retrieval requires one LLM call per chunk during the indexing phase. For large corpora this can be expensive. [[Prompt Caching]] can substantially reduce cost if many chunks come from the same document (the `WHOLE_DOCUMENT` portion of the prompt is shared and cacheable). ## Related - [[Retrieval-Augmented Generation]] - [[RAG Chunking Strategy]] - [[Embedding-Based Retrieval]] - [[Hybrid Search]] - [[Prompt Caching]] ## Sources - [[AI Engineering - Chip Huyen]]