## Definition
**Retrieval-Augmented Generation (RAG)** is the pattern of conditioning an LLM's response on documents retrieved at query time from an external knowledge base — typically a [[Vector Database]] of embeddings. Introduced by Lewis et al. (2020) — see [[Retrieval-Augmented Generation (Lewis et al.)]].
## Why RAG
- **Knowledge that changes faster than training.** Pretraining cutoffs are months behind; RAG injects fresh content per query.
- **Knowledge that doesn't belong in weights.** Internal docs, customer data, regulatory filings — too private or specific to bake into a model.
- **Mitigates [[Hallucination]].** Generation is grounded in retrieved text; the model can cite sources.
- **Cheaper than fine-tuning.** Update the index, not the model.
## The Canonical Pipeline
```
User query
│
▼
[Query embedding] ──→ [Vector DB] ──→ Top-k chunks
│
▼
[Prompt template with chunks + query]
│
▼
[LLM]
│
▼
Generated response
(optionally with citations)
```
## Indexing Side
1. **Chunk** documents (typically 200–1000 tokens, sometimes with overlap).
2. **Embed** each chunk via an embedding model — see [[Embedding]].
3. **Store** in a vector database with metadata for filtering — see [[Vector Database]].
## Retrieval Side
1. **Embed the query** in the same vector space.
2. **Search** for nearest neighbours (cosine or dot product).
3. **Optionally rerank** the top-N with a cross-encoder for higher precision.
4. **Filter** by metadata (date, source, tenant).
## Generation Side
1. **Compose a prompt** with the retrieved chunks as context plus the user query.
2. **Generate** the response.
3. **Cite** the sources (chunk IDs, page numbers, URLs) the response drew on.
## Common Pitfalls
- **Wrong chunking.** Too large → noisy context; too small → loses meaning across chunks.
- **Stale index.** Documents updated but not re-indexed; the LLM cites stale info.
- **Embedding-model mismatch.** Index built with one model, queries embedded with another.
- **No reranking.** Top-k by vector similarity isn't always relevance-ordered; a reranker helps.
- **No citations enforced.** The model claims to draw on retrieved docs but doesn't — a [[Hallucination]] dressed in RAG clothing.
## Evolution Toward Agentic Retrieval
In modern agentic systems, retrieval is increasingly **invoked as a tool** rather than as a preprocessing step. The agent decides *when* to retrieve, *what* to retrieve, and how to iterate — see [[Tool Use]].
## Related
- [[Vector Database]]
- [[Embedding-Based Retrieval]]
- [[Semantic Search]]
- [[Embedding]]
- [[Hallucination]]
- [[Retrieval-Augmented Generation (Lewis et al.)]]