## Definition **Elastic Net** combines [[L1 Regularization]] and [[L2 Regularization]] in a single regulariser: $ \mathcal{L}(\theta) = \mathcal{L}_{\text{data}}(\theta) + \lambda_1 \|\theta\|_1 + \lambda_2 \|\theta\|_2^2 $ Often parameterised as a mixture: $ \Omega(\theta) = \lambda \left[ \alpha \|\theta\|_1 + (1 - \alpha) \|\theta\|_2^2 \right] $ with $\alpha \in [0, 1]$. $\alpha = 0$ recovers Ridge; $\alpha = 1$ recovers Lasso. ## Why Combine Lasso alone: - Selects features (sparsity). - But with **highly correlated features**, picks one arbitrarily and ignores the rest. - Performs poorly when the number of features exceeds the number of samples and features are grouped. Ridge alone: - Stable with correlated features. - But never zeros coefficients — no feature selection. Elastic Net gets both: - L1 component encourages sparsity. - L2 component encourages grouped selection — correlated features tend to be selected (or dropped) together. ## When to Use Elastic Net - **Many correlated features.** L1 alone is unstable; Elastic Net selects groups. - **More features than samples.** L1 selects at most $n$ features in this regime; Elastic Net can select more. - **Genomics and text classification** — classic use cases (thousands of correlated features). ## Hyperparameters Two: - $\lambda$ — overall regularisation strength. - $\alpha$ — mixing ratio between L1 and L2. **Cross-validate over a 2D grid.** `ElasticNetCV` in scikit-learn does this. ## Computational Considerations The L1 component prevents a closed-form solution; coordinate descent is the standard optimiser. Convergence is fast in practice. ## Performance Notes Empirically, Elastic Net often gives a small but consistent improvement over Lasso when correlated features are present — sometimes 1-3% in held-out performance. The right $\alpha$ is workload-dependent; typically $\alpha \in [0.1, 0.9]$ works best, with pure Ridge or pure Lasso at the extremes. ## Related - [[L1 Regularization]] - [[L2 Regularization]] - [[Regularization]] - [[Lasso Regression]] - [[Ridge Regression]]