Boosting - Albert Masoliver's learning site

## Definition **Boosting** is an ensemble technique that trains models **sequentially**, each one focused on correcting the errors of its predecessors. Unlike [[Bagging]] (parallel training, reduces variance), boosting reduces both bias and variance — and dominates tabular ML competitions and production benchmarks. ## The Core Idea Combine many *weak learners* (slightly better than random) into a *strong learner* by: 1. Training the first weak learner on the data. 2. Identifying examples it gets wrong. 3. Training the next weak learner with more emphasis on those examples. 4. Repeating, building up an additive ensemble. The final prediction is a weighted combination of the weak learners. ## The Two Families ### Adaptive Boosting (AdaBoost) - Reweights training examples after each round: misclassified examples get higher weight. - Each weak learner trained on the reweighted distribution. - Final prediction is a weighted vote. See [[AdaBoost]]. ### Gradient Boosting - Treats boosting as gradient descent in function space. - Each new weak learner fits the *negative gradient* of the loss with respect to current predictions — i.e., the residuals. - General framework: any differentiable loss works. See [[Gradient Boosting Machines]], [[XGBoost]]. ## Why It Wins - **Captures complex non-linearities** without hand-crafted features. - **Robust to feature scaling** (when using trees as weak learners). - **Handles mixed feature types** natively. - **Strong out-of-the-box performance** with modest tuning. - **Provides feature importance** scores. ## Risks - **Sensitive to noise.** Boosting attends hardest to misclassified examples — including outliers and mislabelled data. Regularisation (shrinkage, subsampling) helps. - **Overfits if unchecked.** More rounds = more capacity. Use early stopping on a validation set. - **Sequential** — can't parallelise the way bagging can. (Within each round, parallel split-finding is possible.) - **Slower at inference** with many weak learners. ## Practical Stack (2026) - **[[XGBoost]]** — long-standing default; mature, fast, well-tuned. - **LightGBM** — Microsoft's variant; faster on large data, similar accuracy. - **CatBoost** — Yandex's variant; excellent categorical handling, less tuning needed. For most tabular problems, one of these three is the right starting point. ## Related - [[Bagging]] - [[AdaBoost]] - [[Gradient Boosting Machines]] - [[XGBoost]] - [[Random Forest]]