## Definition
**Hybrid search** is the practice of combining [[Term-Based Retrieval]] and [[Embedding-Based Retrieval]] to get the complementary strengths of both: the precision of lexical keyword matching and the semantic flexibility of dense vector search. It is the standard architecture for production RAG retrieval pipelines.
## Why Neither Alone Is Enough
- **Term-based retrieval** excels at exact matches (product codes, error strings, proper nouns) but fails when query and document use synonyms or paraphrases.
- **Embedding-based retrieval** captures meaning but can fail on specific identifiers — a product code like `EADDRNOTAVAIL (99)` may be obscured after embedding.
Combining the two hedges against both failure modes.
## Two Combination Patterns
### Sequential cascade (cheap-then-precise)
A fast, coarse retriever (typically BM25) fetches a large candidate set. A more precise but expensive mechanism — vector search or a cross-encoder reranker — then re-scores just those candidates.
```
Query
│
▼
[BM25] → top-100 candidates
│
▼
[Vector search / cross-encoder] → top-k re-ranked results
```
This is also called **reranking** when the second stage re-scores rather than retrieves. The reranking stage may also weight by recency for time-sensitive applications.
### Parallel ensemble (fusion)
Multiple retrievers run simultaneously and their ranked lists are merged.
**Reciprocal Rank Fusion (RRF)** (Cormack et al., 2009) is the standard merging algorithm. Each document receives a score from every retriever based on its rank:
$
\text{Score}(D) = \sum_{i=1}^{n} \frac{1}{k + r_i(D)}
$
where $r_i(D)$ is the rank given by retriever $i$, $n$ is the number of retrievers, and $k$ is a constant (typically 60) that dampens the influence of lower-ranked documents. A document ranked first by one retriever and second by another scores $\frac{1}{61} + \frac{1}{62} \approx 0.032$, reliably higher than a document only one retriever finds.
## Reranking in Context
Context reranking differs from traditional search reranking: the *absolute rank* matters less than *inclusion*. What matters most is that truly relevant documents are not dropped before the model sees them. [[Lost in the Middle Effect]] means position within the context still matters, but inclusion is the primary objective.
## Implementation Considerations
- Most modern vector databases (Weaviate, Qdrant, Elasticsearch with dense vectors) support hybrid search natively.
- RRF requires no score normalisation across retrievers — only ranks — making it robust to the different scales used by BM25 and cosine similarity.
- When the corpus has entity-rich content (code, product catalogues), starting with BM25 as the coarse stage and semantic search as the precision pass tends to work well.
## Related
- [[Term-Based Retrieval]]
- [[Embedding-Based Retrieval]]
- [[Retrieval-Augmented Generation]]
- [[Semantic Search]]
- [[Lost in the Middle Effect]]
## Sources
- [[AI Engineering - Chip Huyen]]