## Definition
**Ridge regression** is linear regression with an [[L2 Regularization]] penalty:
$
\hat w = \arg\min_w \|Xw - y\|_2^2 + \lambda \|w\|_2^2
$
The added term shrinks coefficients toward zero, reducing variance and improving generalisation when features are correlated or numerous.
## Closed-Form Solution
Unlike many regularised models, Ridge has an exact solution:
$
\hat w = (X^\top X + \lambda I)^{-1} X^\top y
$
The $\lambda I$ term makes the matrix invertible even when $X^\top X$ is singular (e.g., more features than samples, or multicollinear features). This *stabilisation* was the original motivation (Tikhonov 1943, Hoerl & Kennard 1970).
## Effect on Coefficients
- All coefficients shrink toward zero proportionally — but **never to exactly zero**.
- Correlated features share coefficient mass.
- The shrinkage applies **uniformly to all coefficients** — so always standardise features first.
## Choosing $\lambda$
- $\lambda = 0$: ordinary least squares.
- $\lambda \to \infty$: all coefficients shrink to zero (predict the mean).
- Sweet spot found by cross-validation. Use `RidgeCV` in scikit-learn for automatic selection.
## When Ridge Wins
- **Correlated features.** Ordinary least squares produces unstable coefficients; Ridge distributes the load.
- **Many features relative to samples.** Reduces variance without dropping any feature.
- **Need for closed-form solution** — Ridge is faster than iterative methods for moderate-size problems.
## When Lasso Beats Ridge
- **Many irrelevant features.** [[L1 Regularization]] zeros them out; Ridge keeps small non-zero coefficients for all.
- **Need for interpretability.** A sparse Lasso model is easier to explain than a dense Ridge model.
When both correlated features *and* many irrelevant features exist, [[Elastic Net]] combines the strengths.
## Standardisation Reminder
Ridge penalises coefficients equally. If features are on different scales, the penalty is unfair. Always:
```python
pipeline = Pipeline([
('scaler', StandardScaler()),
('ridge', Ridge(alpha=1.0))
])
```
## Related
- [[Linear Regression]]
- [[L2 Regularization]]
- [[Lasso Regression]]
- [[Elastic Net]]