## Definition
**Elastic Net** combines [[L1 Regularization]] and [[L2 Regularization]] in a single regulariser:
$
\mathcal{L}(\theta) = \mathcal{L}_{\text{data}}(\theta) + \lambda_1 \|\theta\|_1 + \lambda_2 \|\theta\|_2^2
$
Often parameterised as a mixture:
$
\Omega(\theta) = \lambda \left[ \alpha \|\theta\|_1 + (1 - \alpha) \|\theta\|_2^2 \right]
$
with $\alpha \in [0, 1]$. $\alpha = 0$ recovers Ridge; $\alpha = 1$ recovers Lasso.
## Why Combine
Lasso alone:
- Selects features (sparsity).
- But with **highly correlated features**, picks one arbitrarily and ignores the rest.
- Performs poorly when the number of features exceeds the number of samples and features are grouped.
Ridge alone:
- Stable with correlated features.
- But never zeros coefficients — no feature selection.
Elastic Net gets both:
- L1 component encourages sparsity.
- L2 component encourages grouped selection — correlated features tend to be selected (or dropped) together.
## When to Use Elastic Net
- **Many correlated features.** L1 alone is unstable; Elastic Net selects groups.
- **More features than samples.** L1 selects at most $n$ features in this regime; Elastic Net can select more.
- **Genomics and text classification** — classic use cases (thousands of correlated features).
## Hyperparameters
Two:
- $\lambda$ — overall regularisation strength.
- $\alpha$ — mixing ratio between L1 and L2.
**Cross-validate over a 2D grid.** `ElasticNetCV` in scikit-learn does this.
## Computational Considerations
The L1 component prevents a closed-form solution; coordinate descent is the standard optimiser. Convergence is fast in practice.
## Performance Notes
Empirically, Elastic Net often gives a small but consistent improvement over Lasso when correlated features are present — sometimes 1-3% in held-out performance. The right $\alpha$ is workload-dependent; typically $\alpha \in [0.1, 0.9]$ works best, with pure Ridge or pure Lasso at the extremes.
## Related
- [[L1 Regularization]]
- [[L2 Regularization]]
- [[Regularization]]
- [[Lasso Regression]]
- [[Ridge Regression]]