## Definition
**Early stopping** is a regularisation technique that halts training when validation performance stops improving. Stops before training loss reaches its minimum; trades some training fit for better generalisation.
## How It Works
```
best_val_loss ← ∞
patience_counter ← 0
for epoch = 1, 2, ...:
train one epoch
val_loss ← evaluate on validation set
if val_loss < best_val_loss:
best_val_loss ← val_loss
save model
patience_counter ← 0
else:
patience_counter ← patience_counter + 1
if patience_counter ≥ patience:
stop
restore best model
```
Two key hyperparameters:
- **Patience** — how many epochs of no improvement before stopping (typically 5-20).
- **Min delta** — minimum improvement to count as "better" (avoid stopping on noise).
## Why It Works
Training loss decreases monotonically by design. Validation loss typically:
1. **Decreases** while the model learns general patterns.
2. **Reaches a minimum.**
3. **Increases** as the model overfits training-specific noise.
Early stopping catches point 2 and freezes the model there.
## A Hidden Regulariser
Early stopping has the same effect as L2 regularisation in the linear setting — both prevent parameters from drifting too far from their initialisation. The connection is mathematically deep (the "implicit regularisation" of gradient descent).
## When to Use
- **Almost always** in practice. It's nearly free (just track validation, save the best).
- Combined with [[L2 Regularization]] / [[Dropout]] / [[Batch Normalization]] — they're complementary.
- Critical for boosting algorithms ([[XGBoost]], LightGBM) where overfitting can be sharp.
## Practical Notes
- **Patience matters.** Too small → stop on noise; too large → wasted compute. 5-10 epochs is a reasonable default.
- **Save the *best* model**, not the last one. The best validation model is the one to deploy.
- **Don't peek at the test set.** Validation set is the early-stopping signal; test set is for final evaluation only.
## Cautions
- **Distribution shift** between validation and deployment: early stopping optimises validation performance, not deployment performance.
- **Long-tail behaviour.** Sometimes validation loss bottoms out very slowly; aggressive patience saves compute, lenient patience squeezes the last gains.
## Connection to Iterative Models
Boosting algorithms use early stopping intrinsically — `early_stopping_rounds` in XGBoost, `early_stopping_round` in LightGBM. The pattern is identical: monitor validation; halt when it plateaus.
## Related
- [[Regularization]]
- [[Overfitting and Underfitting]]
- [[Cross-Validation]]