RAG Chunking Strategy - Albert Masoliver's learning site

## Definition **RAG chunking strategy** is the set of decisions that determine how documents are split into retrievable units before indexing. Because the chunk is the atomic unit of retrieval, chunking choices directly determine what information can be surfaced for any given query. ## Why Chunking Matters A retriever can only return what is in the index. If important information straddles a chunk boundary, neither chunk will surface it cleanly. If chunks are too large, the model's context fills with irrelevant content. Chunking is therefore as consequential as the choice of retrieval algorithm. Constraints that bound chunk size: - The **generative model's context limit** — retrieved chunks must fit alongside the query. - The **embedding model's context limit** — chunks that exceed it get truncated silently. ## Strategy Spectrum ### Fixed-size chunking Split on a fixed unit: characters, words, sentences, or paragraphs. Simple to implement. Common sizes: 512–2048 tokens. ### Recursive chunking Split at the coarsest granularity first (sections), then progressively finer (paragraphs, sentences) until each chunk fits the target size. Reduces the chance of cutting off semantically coherent blocks. ### Structural chunking Exploit document structure: split code by function, Q&A documents by question-answer pair, CSV by row. Keeps logically related content together. ### Token-based chunking Tokenise with the generative model's own tokeniser, then split on token boundaries. Makes downstream context management exact. Downside: re-indexing is required if the model (and thus tokeniser) changes. ## Overlap Non-overlapping chunks risk cutting important context at boundaries. Example: "I left my wife" / "a note" — neither half conveys the meaning of "I left my wife a note." A small overlap (e.g., 10–20% of chunk size, or 20–50 tokens) ensures boundary content appears in at least one clean chunk. The trade-off is larger index size and more redundant retrieval candidates. ## Size Trade-offs | Chunk size | Advantage | Disadvantage | |---|---|---| | Smaller | More chunks fit in context; finer-grained retrieval | Topic spread across the document may be missed; higher indexing cost | | Larger | Captures broader context per chunk | Context diluted with irrelevant content; fewer chunks fit in the model's window | There is no universal optimum. The right size is application-specific and must be measured empirically on representative queries. ## Relationship to Contextual Retrieval Chunking quality can be augmented after splitting via [[Contextual Retrieval]], which prepends AI-generated context to each chunk before indexing — compensating for the loss of surrounding context that splitting causes. ## Related - [[Retrieval-Augmented Generation]] - [[Embedding-Based Retrieval]] - [[Contextual Retrieval]] - [[Context Window]] ## Sources - [[AI Engineering - Chip Huyen]]