## Definition **Inductive bias** is the set of assumptions a learning algorithm makes to predict outputs for inputs it hasn't seen. Without an inductive bias, generalisation is impossible — the [[No Free Lunch Theorem]] says so. ## Two Forms ### Restriction bias The hypothesis space ($[[Hypothesis Space]]$) of the algorithm itself. Linear regression cannot represent non-linear relationships — that's a restriction. Decision trees can represent step functions but not smooth ones. ### Preference bias Within the hypothesis space, the algorithm *prefers* some hypotheses over others. [[L2 Regularization]] prefers small-weight solutions. Decision tree growth prefers shorter trees (Occam's razor). SGD prefers solutions found by a particular search trajectory. ## Examples | Algorithm | Inductive bias | |---|---| | Linear regression | Linearity in features | | k-NN | Smooth, locally constant target function | | Decision trees | Axis-aligned, hierarchical decision rules | | Convolutional networks | Spatial locality, translation equivariance | | Recurrent networks | Sequential dependence, temporal locality | | Transformers | Permutation equivariance + positional encoding | | Random Forest | Many weak decision rules average to a strong one | | L2 regularisation | Smooth, low-norm solutions | ## Why It Matters - **No generalisation without bias.** A learner that makes no assumptions about unseen data has no way to label it. - **Right bias → fast learning.** Less data needed when the bias matches reality. - **Wrong bias → fails silently.** A linear model on non-linear data simply can't learn the truth, regardless of data volume. ## Choosing the Right Bias The art of ML engineering is matching inductive bias to the problem: - **Spatial data (images)** → convolutional bias. - **Sequential data (text, audio)** → recurrence or attention. - **Tabular with mixed types** → tree-based models. - **Strong domain priors** → handcrafted features, kernels, or architecture. ## The Bitter Lesson Rich Sutton's "Bitter Lesson" (2019): historically, AI methods that scale with compute and data tend to beat those relying on hard-coded knowledge. Translation: weaker inductive biases combined with vast scale often win over stronger biases with limited scale. The frontier models of 2026 (LLMs, diffusion) reflect this — relatively general architectures, scaled aggressively, dominate hand-crafted competitors. ## Related - [[No Free Lunch Theorem]] - [[Hypothesis Space]] - [[Bias-Variance Tradeoff]] - [[Machine Learning]]