Ridge Regression - Albert Masoliver's learning site

## Definition **Ridge regression** is linear regression with an [[L2 Regularization]] penalty: $ \hat w = \arg\min_w \|Xw - y\|_2^2 + \lambda \|w\|_2^2 $ The added term shrinks coefficients toward zero, reducing variance and improving generalisation when features are correlated or numerous. ## Closed-Form Solution Unlike many regularised models, Ridge has an exact solution: $ \hat w = (X^\top X + \lambda I)^{-1} X^\top y $ The $\lambda I$ term makes the matrix invertible even when $X^\top X$ is singular (e.g., more features than samples, or multicollinear features). This *stabilisation* was the original motivation (Tikhonov 1943, Hoerl & Kennard 1970). ## Effect on Coefficients - All coefficients shrink toward zero proportionally — but **never to exactly zero**. - Correlated features share coefficient mass. - The shrinkage applies **uniformly to all coefficients** — so always standardise features first. ## Choosing $\lambda$ - $\lambda = 0$: ordinary least squares. - $\lambda \to \infty$: all coefficients shrink to zero (predict the mean). - Sweet spot found by cross-validation. Use `RidgeCV` in scikit-learn for automatic selection. ## When Ridge Wins - **Correlated features.** Ordinary least squares produces unstable coefficients; Ridge distributes the load. - **Many features relative to samples.** Reduces variance without dropping any feature. - **Need for closed-form solution** — Ridge is faster than iterative methods for moderate-size problems. ## When Lasso Beats Ridge - **Many irrelevant features.** [[L1 Regularization]] zeros them out; Ridge keeps small non-zero coefficients for all. - **Need for interpretability.** A sparse Lasso model is easier to explain than a dense Ridge model. When both correlated features *and* many irrelevant features exist, [[Elastic Net]] combines the strengths. ## Standardisation Reminder Ridge penalises coefficients equally. If features are on different scales, the penalty is unfair. Always: ```python pipeline = Pipeline([ ('scaler', StandardScaler()), ('ridge', Ridge(alpha=1.0)) ]) ``` ## Related - [[Linear Regression]] - [[L2 Regularization]] - [[Lasso Regression]] - [[Elastic Net]]