## Definition **Bagging** (Bootstrap Aggregating) is an ensemble technique that trains multiple instances of the same model on different bootstrap samples of the data, then aggregates their predictions. Introduced by Breiman (1996). Reduces variance — the core mechanism behind [[Random Forest]]. ## Algorithm ``` for b = 1 to B: sample n examples WITH REPLACEMENT from the training set → D_b train model M_b on D_b predict(x): classification: majority vote over M_1(x), ..., M_B(x) regression: average over M_1(x), ..., M_B(x) ``` Each $D_b$ contains some duplicates and omits ~37% of original examples (the **out-of-bag** examples, useful for free validation). ## Why It Works Averaging $B$ independent models with the same expected output reduces variance by a factor of $B$ — if they're truly independent. They're not (trained on overlapping data), so the reduction is partial but real. Bias is unaffected. So bagging helps **high-variance, low-bias** models — full decision trees, complex neural networks. It does not help high-bias models like underfitted linear regression. ## Out-of-Bag Evaluation Each $M_b$ wasn't trained on ~37% of examples — its **OOB samples**. For each example, average predictions over the models that didn't see it. Gives a free estimate of generalisation without cross-validation. ## Bagged Trees vs Random Forest Bagging full decision trees works. [[Random Forest]] adds *random feature subset selection* at each split — further decorrelates the trees, yielding more variance reduction. ## When to Use - Underlying model has high variance. - Variance dominates over bias in the bias-variance budget. - Training is parallelisable — bagged models train independently. ## When NOT to Use - Underlying model has high bias (bagging won't help). - Limited compute — bagging multiplies training cost by $B$. - Linear models with stable estimators — bagging adds little. ## Variants - **Pasting** — sample *without* replacement. - **Random subspace method** — bag over features rather than examples. - **Random patches** — bag over both examples and features. ## Related - [[Random Forest]] - [[Boosting]] - [[Decision Trees]] - [[Bias-Variance Tradeoff]]