Overfitting and Underfitting - Albert Masoliver's learning site

## Definition **Overfitting** is when a model fits the training data too closely, including its noise, and generalises poorly. **Underfitting** is when a model is too constrained to capture the underlying pattern, performing poorly on both training and test data. The two opposite failure modes of supervised learning. ## Diagnostic Signatures | Symptom | Likely cause | |---|---| | Training error low, test error high | Overfitting (high variance) | | Training error high, test error similarly high | Underfitting (high bias) | | Both errors decreasing with more data | Healthy training | | Both errors plateaued at high values | Underfitting | | Gap grows as model complexity grows | Overfitting risk increasing | ## Causes **Overfitting** — model has too much capacity relative to the signal available. Common drivers: - Too many parameters for too few examples. - Training too long with no early stopping. - Insufficient regularisation. - Noisy or unrepresentative training data. **Underfitting** — model is structurally unable to capture the pattern. Common drivers: - Hypothesis class too restrictive (e.g., linear model on non-linear data). - Excessive regularisation. - Insufficient training time. - Wrong features. ## Mitigations **Against overfitting:** - More data. - Simpler model. - [[Regularization]] (L1, L2, dropout). - Early stopping. - Data augmentation. - Ensembling. **Against underfitting:** - Richer model class. - Better features (see [[Feature Engineering]]). - Less regularisation. - Train longer. ## Validation as the Compass You diagnose by *comparing training and validation performance*, not by absolute numbers. See [[Cross-Validation]]. ## Related - [[Bias-Variance Tradeoff]] - [[Generalization]] - [[Regularization]] - [[Cross-Validation]]