## Definition The **No Free Lunch (NFL) theorem** (Wolpert, 1996) states that *averaged over all possible problems*, no learning algorithm outperforms any other. There is no universally best learner; the value of an algorithm depends on the match between its [[Inductive Bias]] and the structure of the problems you actually encounter. ## The Formal Claim Averaged over the uniform distribution of all possible target functions $f: X \to Y$, every learning algorithm has the same expected generalisation performance — including random guessing. The proof is essentially combinatorial: a learner that does better than chance on one set of targets must do worse on a complementary set. ## What This Means - **There is no "best" algorithm.** "What's the best ML algorithm for X?" has an answer only if you specify X. - **Algorithm performance is contingent.** An algorithm wins on real problems because it carries inductive biases aligned with how real problems work — not because of intrinsic superiority. - **Comparisons must be empirical.** Theoretical performance guarantees apply to *some* distributions, never to all. ## What It Doesn't Mean The NFL is sometimes misread as "all algorithms are equally good in practice." That is **wrong**. The averages are over a fictional, uniform distribution of problems; *real* problems are not uniformly distributed. Some inductive biases (smoothness, hierarchical structure, sparsity) match real data far better than others — and those algorithms win consistently. ## Practical Implications - **Choose algorithms whose inductive biases match your domain.** - Spatial structure → convolutional networks. - Sequential structure → recurrent networks, transformers. - Tree-like decisions → gradient-boosted trees. - Few examples + high dimensionality → kernel methods. - **Use ensembles** ([[Bagging]], [[Boosting]]) to hedge across inductive biases. - **Empirical benchmarking** within your domain beats general claims. ## NFL for Optimisation A parallel NFL theorem holds for black-box optimisation: averaged over all objective functions, every optimiser performs the same. Again, in practice we don't face uniform distributions over objectives; we face structured problems where some optimisers excel. ## Related - [[Inductive Bias]] - [[Hypothesis Space]] - [[Bias-Variance Tradeoff]] - [[Machine Learning]]