## Definition
**AdaBoost** (Adaptive Boosting) is the original boosting algorithm (Freund & Schapire, 1995, Gödel Prize 2003). Trains a sequence of weak learners, each on a reweighted version of the training data emphasising previous errors.
## Algorithm
```
initialise sample weights w_i = 1/n for all i
for t = 1 to T:
train weak learner h_t on the weighted data
compute weighted error ε_t = Σ_i w_i · [h_t(x_i) ≠ y_i] / Σ_i w_i
if ε_t > 0.5: stop (weak learner is worse than chance)
compute learner weight α_t = 0.5 · log((1 - ε_t) / ε_t)
update sample weights: w_i ← w_i · exp(-α_t · y_i · h_t(x_i))
normalise w_i so they sum to 1
final prediction: sign(Σ_t α_t · h_t(x))
```
## Three Key Properties
1. **Weak learners' weights $\alpha_t$ depend on accuracy.** Better learners get more influence in the final vote.
2. **Misclassified examples get reweighted up.** The next learner focuses on what previous ones missed.
3. **Correctly-classified examples get reweighted down.** Confident regions get less attention.
## Theoretical Guarantee
If each weak learner has accuracy > 50% (better than random), the training error of the ensemble decreases *exponentially* with $T$. A surprising and influential result — boosting can drive training error to zero with simple components.
## Why It Doesn't Always Overfit
Despite increasing capacity with each round, AdaBoost often *continues to generalise* after training error reaches zero. The intuition: increasing margins on correctly-classified examples continues to improve test performance.
## Common Weak Learner: Decision Stumps
A *decision stump* is a one-level decision tree. AdaBoost with stumps:
- Each stump tests one feature against one threshold.
- Hundreds of stumps combined yield a complex non-linear classifier.
- Implicit feature selection via which features get used.
## Strengths
- **Simple to implement.**
- **Provably reduces training error.**
- **Often generalises well** despite intuition about overfitting.
- **Feature importance** falls out naturally.
## Weaknesses
- **Sensitive to noisy data and outliers.** Misclassified noisy examples get reweighted up; AdaBoost spends capacity chasing noise.
- **Less performant than gradient boosting** on most modern benchmarks.
## Historical Significance
AdaBoost was the first practical boosting algorithm. It launched the entire boosting research programme that ultimately produced [[Gradient Boosting Machines]] and [[XGBoost]] — which now dominate the field.
## When to Use
In 2026, AdaBoost is largely a pedagogical tool — gradient boosting variants outperform it on virtually all real problems. Worth knowing for theoretical grounding; rarely worth choosing in production.
## Related
- [[Boosting]]
- [[Gradient Boosting Machines]]
- [[XGBoost]]
- [[Decision Trees]]