## Definition
**Contextual retrieval** is a [[RAG Chunking Strategy|chunking augmentation]] technique that prepends a short, AI-generated context to each document chunk before indexing — so that the chunk carries enough information to be retrieved even when its meaning depends on the surrounding document. Introduced by Anthropic (2024).
## The Problem It Solves
When a document is split into chunks, individual chunks often lose their context. A chunk that says "the second stage requires a temperature above 400°C" is retrievable by keyword, but a retriever cannot know which process or experiment this refers to without the surrounding document. After embedding, this ambiguity becomes even harder to resolve.
## The Technique
Before indexing, each chunk is augmented with a short description — typically 50–100 tokens — generated by an LLM using the whole document and the chunk as input:
```
<document>
{{WHOLE_DOCUMENT}}
</document>
Here is the chunk we want to situate within the whole document:
<chunk>
{{CHUNK_CONTENT}}
</chunk>
Please give a short succinct context to situate this chunk within the overall
document for the purposes of improving search retrieval of the chunk.
Answer only with the succinct context and nothing else.
```
The generated context is prepended to the chunk. The augmented chunk is then indexed by the retrieval algorithm (term-based, embedding-based, or both).
## Example
Raw chunk: `"the gross margin improved by 3 points year-over-year"`
With contextual prefix: `"This chunk is from Acme Corp's Q3 2024 earnings report. It describes profitability improvements versus Q3 2023. The gross margin improved by 3 points year-over-year."`
The augmented chunk now retrieves correctly for queries about Acme's profitability, Q3 results, or year-over-year comparisons — none of which appear in the raw chunk.
## Complementary Augmentation Tactics
Beyond AI-generated context, other metadata can be appended to chunks:
- **Extracted entities** — product codes, error codes, proper nouns that should be keyword-searchable even after embedding.
- **Answerable questions** — for customer support corpora, each article can be augmented with questions it answers ("How do I reset my password?", "I can't log in"). This aligns the index with how users actually query.
- **Document-level metadata** — title, author, date, tags for filtering at retrieval time.
## Cost Considerations
Contextual retrieval requires one LLM call per chunk during the indexing phase. For large corpora this can be expensive. [[Prompt Caching]] can substantially reduce cost if many chunks come from the same document (the `WHOLE_DOCUMENT` portion of the prompt is shared and cacheable).
## Related
- [[Retrieval-Augmented Generation]]
- [[RAG Chunking Strategy]]
- [[Embedding-Based Retrieval]]
- [[Hybrid Search]]
- [[Prompt Caching]]
## Sources
- [[AI Engineering - Chip Huyen]]