## Definition
**Polynomial regression** fits a polynomial of degree $k$ in the input features:
$
\hat y = w_0 + w_1 x + w_2 x^2 + \dots + w_k x^k
$
It is still **linear regression** — linear in the *parameters* — applied to polynomial features. The classical example of basis-function expansion.
## Multivariate Form
For multiple inputs, include all cross-terms up to degree $k$:
$
\hat y = w_0 + \sum_i w_i x_i + \sum_{i \leq j} w_{ij} x_i x_j + \dots
$
The feature vector grows combinatorially: degree-$k$ polynomial features in $d$ variables yields $\binom{d + k}{k}$ features.
## Why It Matters
- Bridges linear and non-linear modelling — fit non-linear relationships with linear regression machinery.
- Pedagogical workhorse: minimal model that exhibits the [[Bias-Variance Tradeoff]] visually as $k$ grows.
## The Bias-Variance Story
- **Low $k$** (1, 2): underfitting; high bias.
- **High $k$** (10+): overfitting; high variance. Wildly oscillating curves that pass through training points but generalise poorly.
The classic Runge phenomenon: high-degree polynomials oscillate near data boundaries.
## Regularisation Helps
A degree-10 polynomial with [[L2 Regularization]] (Ridge) can outperform a degree-3 unregularised one. Regularisation tames the high-degree wiggle.
## Modern Status
Polynomial regression remains useful for:
- Pedagogy.
- Small problems where you know the relationship is smooth and degree-low.
- Engineered interaction features (specific products, not full expansions).
For non-linear relationships in general, kernel methods, splines, or neural networks are usually better.
## Splines
A modern variant: piecewise polynomials with continuity constraints at "knots". Less prone to global oscillation than monolithic polynomial regression. The basis of generalised additive models (GAMs).
## Related
- [[Linear Regression]]
- [[Ridge Regression]]
- [[Bias-Variance Tradeoff]]
- [[Feature Engineering]]