## Definition
**Random Forest** is an ensemble of decision trees combining two sources of randomness — bootstrap-sampled training data ([[Bagging]]) and random feature subset selection at each split. Introduced by Breiman (2001). Remains one of the strongest tabular ML algorithms.
## Algorithm
```
for b = 1 to B:
D_b ← bootstrap sample of training set
grow a decision tree on D_b with this modification:
at each split, randomly sample m features from the d available
find the best split using only those m features
predict(x):
classification: majority vote across trees
regression: average across trees
```
The "extra randomness" — feature subset at each split — is what distinguishes Random Forest from plain bagged trees.
## Hyperparameters
- **$B$** (number of trees): typically 100-500. More trees = lower variance, never hurts (just slower). Diminishing returns past ~200 for most problems.
- **$m$** (features per split): typically $\sqrt{d}$ for classification, $d/3$ for regression. Lower $m$ = more decorrelation = higher variance reduction.
- **Max depth, min samples per leaf:** control individual tree complexity. Often left at default (trees grown deep) — bagging handles the variance.
## Why It Works
- **Trees alone** are high-variance: small data changes → very different trees.
- **Bagging** averages over $B$ samples → variance ÷ $B$ if trees were independent.
- **Trees aren't independent** (shared data) — random feature selection reduces correlation, recovering most of the theoretical variance reduction.
## Strengths
- **Strong out-of-the-box performance** with minimal tuning.
- **Handles mixed feature types**, missing values (in some implementations), high dimensions.
- **No feature scaling needed.**
- **Built-in feature importance** (Gini importance or permutation importance).
- **Out-of-bag (OOB) error** — free validation.
## Weaknesses
- **Less accurate than gradient boosting** on most tabular benchmarks.
- **Memory-heavy** with many deep trees.
- **Slower inference** than a single tree.
- **Feature importance can mislead** for correlated features (one is picked; the other looks unimportant).
## Random Forest vs Gradient Boosting
| Property | Random Forest | Gradient Boosting |
|---|---|---|
| Training | Parallel (independent trees) | Sequential |
| Tuning sensitivity | Low | High |
| Out-of-the-box accuracy | Good | Better with tuning |
| Overfitting risk | Low | Higher (use early stopping) |
| Inference speed | Slower (deeper trees) | Faster (shallow trees) |
Random Forest is the safer default; gradient boosting is the higher ceiling.
## When to Use
- Quick strong baseline for tabular ML.
- Small to medium datasets (~10k to 1M rows).
- Production where consistency matters more than squeezing the last 1% of accuracy.
- Feature importance analysis.
## Related
- [[Decision Trees]]
- [[Bagging]]
- [[Gradient Boosting Machines]]
- [[XGBoost]]