Random Forest - Albert Masoliver's learning site

## Definition **Random Forest** is an ensemble of decision trees combining two sources of randomness — bootstrap-sampled training data ([[Bagging]]) and random feature subset selection at each split. Introduced by Breiman (2001). Remains one of the strongest tabular ML algorithms. ## Algorithm ``` for b = 1 to B: D_b ← bootstrap sample of training set grow a decision tree on D_b with this modification: at each split, randomly sample m features from the d available find the best split using only those m features predict(x): classification: majority vote across trees regression: average across trees ``` The "extra randomness" — feature subset at each split — is what distinguishes Random Forest from plain bagged trees. ## Hyperparameters - **$B$** (number of trees): typically 100-500. More trees = lower variance, never hurts (just slower). Diminishing returns past ~200 for most problems. - **$m$** (features per split): typically $\sqrt{d}$ for classification, $d/3$ for regression. Lower $m$ = more decorrelation = higher variance reduction. - **Max depth, min samples per leaf:** control individual tree complexity. Often left at default (trees grown deep) — bagging handles the variance. ## Why It Works - **Trees alone** are high-variance: small data changes → very different trees. - **Bagging** averages over $B$ samples → variance ÷ $B$ if trees were independent. - **Trees aren't independent** (shared data) — random feature selection reduces correlation, recovering most of the theoretical variance reduction. ## Strengths - **Strong out-of-the-box performance** with minimal tuning. - **Handles mixed feature types**, missing values (in some implementations), high dimensions. - **No feature scaling needed.** - **Built-in feature importance** (Gini importance or permutation importance). - **Out-of-bag (OOB) error** — free validation. ## Weaknesses - **Less accurate than gradient boosting** on most tabular benchmarks. - **Memory-heavy** with many deep trees. - **Slower inference** than a single tree. - **Feature importance can mislead** for correlated features (one is picked; the other looks unimportant). ## Random Forest vs Gradient Boosting | Property | Random Forest | Gradient Boosting | |---|---|---| | Training | Parallel (independent trees) | Sequential | | Tuning sensitivity | Low | High | | Out-of-the-box accuracy | Good | Better with tuning | | Overfitting risk | Low | Higher (use early stopping) | | Inference speed | Slower (deeper trees) | Faster (shallow trees) | Random Forest is the safer default; gradient boosting is the higher ceiling. ## When to Use - Quick strong baseline for tabular ML. - Small to medium datasets (~10k to 1M rows). - Production where consistency matters more than squeezing the last 1% of accuracy. - Feature importance analysis. ## Related - [[Decision Trees]] - [[Bagging]] - [[Gradient Boosting Machines]] - [[XGBoost]]