Polynomial Regression - Albert Masoliver's learning site

## Definition **Polynomial regression** fits a polynomial of degree $k$ in the input features: $ \hat y = w_0 + w_1 x + w_2 x^2 + \dots + w_k x^k $ It is still **linear regression** — linear in the *parameters* — applied to polynomial features. The classical example of basis-function expansion. ## Multivariate Form For multiple inputs, include all cross-terms up to degree $k$: $ \hat y = w_0 + \sum_i w_i x_i + \sum_{i \leq j} w_{ij} x_i x_j + \dots $ The feature vector grows combinatorially: degree-$k$ polynomial features in $d$ variables yields $\binom{d + k}{k}$ features. ## Why It Matters - Bridges linear and non-linear modelling — fit non-linear relationships with linear regression machinery. - Pedagogical workhorse: minimal model that exhibits the [[Bias-Variance Tradeoff]] visually as $k$ grows. ## The Bias-Variance Story - **Low $k$** (1, 2): underfitting; high bias. - **High $k$** (10+): overfitting; high variance. Wildly oscillating curves that pass through training points but generalise poorly. The classic Runge phenomenon: high-degree polynomials oscillate near data boundaries. ## Regularisation Helps A degree-10 polynomial with [[L2 Regularization]] (Ridge) can outperform a degree-3 unregularised one. Regularisation tames the high-degree wiggle. ## Modern Status Polynomial regression remains useful for: - Pedagogy. - Small problems where you know the relationship is smooth and degree-low. - Engineered interaction features (specific products, not full expansions). For non-linear relationships in general, kernel methods, splines, or neural networks are usually better. ## Splines A modern variant: piecewise polynomials with continuity constraints at "knots". Less prone to global oscillation than monolithic polynomial regression. The basis of generalised additive models (GAMs). ## Related - [[Linear Regression]] - [[Ridge Regression]] - [[Bias-Variance Tradeoff]] - [[Feature Engineering]]