## Definition **Hybrid search** is the practice of combining [[Term-Based Retrieval]] and [[Embedding-Based Retrieval]] to get the complementary strengths of both: the precision of lexical keyword matching and the semantic flexibility of dense vector search. It is the standard architecture for production RAG retrieval pipelines. ## Why Neither Alone Is Enough - **Term-based retrieval** excels at exact matches (product codes, error strings, proper nouns) but fails when query and document use synonyms or paraphrases. - **Embedding-based retrieval** captures meaning but can fail on specific identifiers — a product code like `EADDRNOTAVAIL (99)` may be obscured after embedding. Combining the two hedges against both failure modes. ## Two Combination Patterns ### Sequential cascade (cheap-then-precise) A fast, coarse retriever (typically BM25) fetches a large candidate set. A more precise but expensive mechanism — vector search or a cross-encoder reranker — then re-scores just those candidates. ``` Query │ ▼ [BM25] → top-100 candidates │ ▼ [Vector search / cross-encoder] → top-k re-ranked results ``` This is also called **reranking** when the second stage re-scores rather than retrieves. The reranking stage may also weight by recency for time-sensitive applications. ### Parallel ensemble (fusion) Multiple retrievers run simultaneously and their ranked lists are merged. **Reciprocal Rank Fusion (RRF)** (Cormack et al., 2009) is the standard merging algorithm. Each document receives a score from every retriever based on its rank: $ \text{Score}(D) = \sum_{i=1}^{n} \frac{1}{k + r_i(D)} $ where $r_i(D)$ is the rank given by retriever $i$, $n$ is the number of retrievers, and $k$ is a constant (typically 60) that dampens the influence of lower-ranked documents. A document ranked first by one retriever and second by another scores $\frac{1}{61} + \frac{1}{62} \approx 0.032$, reliably higher than a document only one retriever finds. ## Reranking in Context Context reranking differs from traditional search reranking: the *absolute rank* matters less than *inclusion*. What matters most is that truly relevant documents are not dropped before the model sees them. [[Lost in the Middle Effect]] means position within the context still matters, but inclusion is the primary objective. ## Implementation Considerations - Most modern vector databases (Weaviate, Qdrant, Elasticsearch with dense vectors) support hybrid search natively. - RRF requires no score normalisation across retrievers — only ranks — making it robust to the different scales used by BM25 and cosine similarity. - When the corpus has entity-rich content (code, product catalogues), starting with BM25 as the coarse stage and semantic search as the precision pass tends to work well. ## Related - [[Term-Based Retrieval]] - [[Embedding-Based Retrieval]] - [[Retrieval-Augmented Generation]] - [[Semantic Search]] - [[Lost in the Middle Effect]] ## Sources - [[AI Engineering - Chip Huyen]]