## Definition
The **No Free Lunch (NFL) theorem** (Wolpert, 1996) states that *averaged over all possible problems*, no learning algorithm outperforms any other. There is no universally best learner; the value of an algorithm depends on the match between its [[Inductive Bias]] and the structure of the problems you actually encounter.
## The Formal Claim
Averaged over the uniform distribution of all possible target functions $f: X \to Y$, every learning algorithm has the same expected generalisation performance — including random guessing.
The proof is essentially combinatorial: a learner that does better than chance on one set of targets must do worse on a complementary set.
## What This Means
- **There is no "best" algorithm.** "What's the best ML algorithm for X?" has an answer only if you specify X.
- **Algorithm performance is contingent.** An algorithm wins on real problems because it carries inductive biases aligned with how real problems work — not because of intrinsic superiority.
- **Comparisons must be empirical.** Theoretical performance guarantees apply to *some* distributions, never to all.
## What It Doesn't Mean
The NFL is sometimes misread as "all algorithms are equally good in practice." That is **wrong**. The averages are over a fictional, uniform distribution of problems; *real* problems are not uniformly distributed. Some inductive biases (smoothness, hierarchical structure, sparsity) match real data far better than others — and those algorithms win consistently.
## Practical Implications
- **Choose algorithms whose inductive biases match your domain.**
- Spatial structure → convolutional networks.
- Sequential structure → recurrent networks, transformers.
- Tree-like decisions → gradient-boosted trees.
- Few examples + high dimensionality → kernel methods.
- **Use ensembles** ([[Bagging]], [[Boosting]]) to hedge across inductive biases.
- **Empirical benchmarking** within your domain beats general claims.
## NFL for Optimisation
A parallel NFL theorem holds for black-box optimisation: averaged over all objective functions, every optimiser performs the same. Again, in practice we don't face uniform distributions over objectives; we face structured problems where some optimisers excel.
## Related
- [[Inductive Bias]]
- [[Hypothesis Space]]
- [[Bias-Variance Tradeoff]]
- [[Machine Learning]]