## Definition
**Lasso regression** is linear regression with an [[L1 Regularization]] penalty:
$
\hat w = \arg\min_w \|Xw - y\|_2^2 + \lambda \|w\|_1
$
The L1 penalty drives many coefficients to **exactly zero**, performing implicit [[Feature Selection]]. Introduced by Tibshirani (1996).
## Why Coefficients Hit Zero
The L1 constraint region $\|w\|_1 \leq t$ is a diamond with corners on the coordinate axes. The optimum of the constrained problem often lies at a corner, where some coefficients are zero exactly.
By contrast, Ridge's L2 constraint region is a sphere — solutions shrink toward zero but never reach it exactly.
## No Closed-Form Solution
L1 is non-differentiable at zero, so there's no closed-form like Ridge's. Optimisation uses:
- **Coordinate descent** — update one coefficient at a time via the soft-threshold operator. Used by scikit-learn.
- **LARS** (Least Angle Regression) — efficient computation of the entire regularisation path.
- **Proximal gradient methods.**
## Properties
- **Sparse solutions.** Many coefficients exactly zero.
- **Embedded feature selection.** Train once; selection comes for free.
- **Sensitive to correlated features.** Lasso arbitrarily picks one of a correlated pair and zeros the other — unstable when features are highly correlated.
- **At most $n$ non-zero coefficients** (where $n$ = samples) — a hard limit.
## When Lasso Wins
- **High-dimensional sparse problems.** Many features; few are relevant. Genomics, text features.
- **Interpretability requirements.** A model using 10 of 1000 features is far more communicable than a Ridge model using small coefficients on all 1000.
- **Implicit feature selection.** Avoids a separate selection step.
## When Lasso Loses
- **Many correlated features.** Lasso arbitrarily picks one; Ridge spreads coefficient mass. [[Elastic Net]] is the fix.
- **More than $n$ truly relevant features.** Lasso's hard cap matters.
## Regularisation Path
A standard analysis: plot coefficient values as $\lambda$ varies from large to small. Coefficients enter the model in order of importance — visual confirmation of feature priority.
## Practical Notes
- **Always standardise features.** The L1 penalty depends on scale.
- **Use `LassoCV`** for automatic $\lambda$ selection via cross-validation.
- **Inspect the path** to understand which features matter and at what regularisation strength.
## Related
- [[Linear Regression]]
- [[L1 Regularization]]
- [[Ridge Regression]]
- [[Elastic Net]]
- [[Feature Selection]]