## Definition
**XGBoost** (eXtreme Gradient Boosting, Chen & Guestrin 2016) is the most widely-deployed gradient boosting library. Combines a regularised objective, engineering optimisations, and excellent default behaviour. Dominated Kaggle competitions for years and remains a top-3 default for tabular ML in 2026.
## The Regularised Objective
XGBoost adds explicit regularisation to the loss:
$
\mathcal{L} = \sum_i L(y_i, \hat y_i) + \sum_t \Omega(f_t)
$
with
$
\Omega(f) = \gamma T + \frac{1}{2} \lambda \sum_j w_j^2
$
- $T$ — number of leaves in tree $f$.
- $w_j$ — leaf weights.
- $\gamma$ — penalty per leaf (encourages simpler trees).
- $\lambda$ — L2 penalty on leaf weights.
This explicit regularisation is part of why XGBoost generalises better than vanilla GBM.
## Second-Order Approximation
XGBoost expands the loss to second order in current predictions:
$
\mathcal{L} \approx \sum_i \left[ g_i \, f_t(x_i) + \frac{1}{2} h_i \, f_t(x_i)^2 \right] + \Omega(f_t)
$
with $g_i$ and $h_i$ the first and second derivatives. The optimal leaf weight and split gain have closed forms — much faster than gradient-only approaches.
## Engineering Optimisations
- **Histogram-based split finding.** Bin features into a fixed number of bins; find splits over bins, not raw values. Much faster on large data.
- **Sparsity-aware splits.** Missing values handled by default direction at each split, learned during training.
- **Cache-friendly data layout.**
- **Parallel split-finding** within each tree.
- **GPU support** for very large datasets.
## Key Hyperparameters
| Hyperparameter | Typical range | Notes |
|---|---|---|
| `learning_rate` (`eta`) | 0.01-0.3 | Smaller + more rounds = better generalisation |
| `n_estimators` | 100-2000 | Use early stopping |
| `max_depth` | 3-10 | Higher = more complex |
| `min_child_weight` | 1-10 | Higher = more conservative |
| `subsample` | 0.5-1.0 | Row sampling |
| `colsample_bytree` | 0.5-1.0 | Feature sampling per tree |
| `reg_alpha` | 0-1 | L1 on weights |
| `reg_lambda` | 0-1 | L2 on weights |
| `gamma` | 0-1 | Min loss reduction for split |
## Workflow
```python
import xgboost as xgb
model = xgb.XGBClassifier(
n_estimators=1000,
learning_rate=0.05,
max_depth=6,
early_stopping_rounds=50,
)
model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
```
The combination `early_stopping_rounds + eval_set` is standard practice: train many rounds; stop when validation stops improving.
## Strengths
- **Production-grade.** Years of bug fixes, distributed training, GPU support, ONNX/PMML export.
- **Strong defaults.** Even with light tuning, competitive results.
- **Documentation and ecosystem.** Most tutorials and Kaggle kernels use it.
## When to Use
- First reach for any tabular ML problem in 2026.
- When you need the squeeze of every percentage point of accuracy.
- When ensembling with Random Forest, LightGBM, CatBoost for the strongest possible tabular baseline.
## Related
- [[Gradient Boosting Machines]]
- [[Random Forest]]
- [[Boosting]]
- [[Decision Trees]]