Early Stopping - Albert Masoliver's learning site

## Definition **Early stopping** is a regularisation technique that halts training when validation performance stops improving. Stops before training loss reaches its minimum; trades some training fit for better generalisation. ## How It Works ``` best_val_loss ← ∞ patience_counter ← 0 for epoch = 1, 2, ...: train one epoch val_loss ← evaluate on validation set if val_loss < best_val_loss: best_val_loss ← val_loss save model patience_counter ← 0 else: patience_counter ← patience_counter + 1 if patience_counter ≥ patience: stop restore best model ``` Two key hyperparameters: - **Patience** — how many epochs of no improvement before stopping (typically 5-20). - **Min delta** — minimum improvement to count as "better" (avoid stopping on noise). ## Why It Works Training loss decreases monotonically by design. Validation loss typically: 1. **Decreases** while the model learns general patterns. 2. **Reaches a minimum.** 3. **Increases** as the model overfits training-specific noise. Early stopping catches point 2 and freezes the model there. ## A Hidden Regulariser Early stopping has the same effect as L2 regularisation in the linear setting — both prevent parameters from drifting too far from their initialisation. The connection is mathematically deep (the "implicit regularisation" of gradient descent). ## When to Use - **Almost always** in practice. It's nearly free (just track validation, save the best). - Combined with [[L2 Regularization]] / [[Dropout]] / [[Batch Normalization]] — they're complementary. - Critical for boosting algorithms ([[XGBoost]], LightGBM) where overfitting can be sharp. ## Practical Notes - **Patience matters.** Too small → stop on noise; too large → wasted compute. 5-10 epochs is a reasonable default. - **Save the *best* model**, not the last one. The best validation model is the one to deploy. - **Don't peek at the test set.** Validation set is the early-stopping signal; test set is for final evaluation only. ## Cautions - **Distribution shift** between validation and deployment: early stopping optimises validation performance, not deployment performance. - **Long-tail behaviour.** Sometimes validation loss bottoms out very slowly; aggressive patience saves compute, lenient patience squeezes the last gains. ## Connection to Iterative Models Boosting algorithms use early stopping intrinsically — `early_stopping_rounds` in XGBoost, `early_stopping_round` in LightGBM. The pattern is identical: monitor validation; halt when it plateaus. ## Related - [[Regularization]] - [[Overfitting and Underfitting]] - [[Cross-Validation]]