XGBoost - Albert Masoliver's learning site

## Definition **XGBoost** (eXtreme Gradient Boosting, Chen & Guestrin 2016) is the most widely-deployed gradient boosting library. Combines a regularised objective, engineering optimisations, and excellent default behaviour. Dominated Kaggle competitions for years and remains a top-3 default for tabular ML in 2026. ## The Regularised Objective XGBoost adds explicit regularisation to the loss: $ \mathcal{L} = \sum_i L(y_i, \hat y_i) + \sum_t \Omega(f_t) $ with $ \Omega(f) = \gamma T + \frac{1}{2} \lambda \sum_j w_j^2 $ - $T$ — number of leaves in tree $f$. - $w_j$ — leaf weights. - $\gamma$ — penalty per leaf (encourages simpler trees). - $\lambda$ — L2 penalty on leaf weights. This explicit regularisation is part of why XGBoost generalises better than vanilla GBM. ## Second-Order Approximation XGBoost expands the loss to second order in current predictions: $ \mathcal{L} \approx \sum_i \left[ g_i \, f_t(x_i) + \frac{1}{2} h_i \, f_t(x_i)^2 \right] + \Omega(f_t) $ with $g_i$ and $h_i$ the first and second derivatives. The optimal leaf weight and split gain have closed forms — much faster than gradient-only approaches. ## Engineering Optimisations - **Histogram-based split finding.** Bin features into a fixed number of bins; find splits over bins, not raw values. Much faster on large data. - **Sparsity-aware splits.** Missing values handled by default direction at each split, learned during training. - **Cache-friendly data layout.** - **Parallel split-finding** within each tree. - **GPU support** for very large datasets. ## Key Hyperparameters | Hyperparameter | Typical range | Notes | |---|---|---| | `learning_rate` (`eta`) | 0.01-0.3 | Smaller + more rounds = better generalisation | | `n_estimators` | 100-2000 | Use early stopping | | `max_depth` | 3-10 | Higher = more complex | | `min_child_weight` | 1-10 | Higher = more conservative | | `subsample` | 0.5-1.0 | Row sampling | | `colsample_bytree` | 0.5-1.0 | Feature sampling per tree | | `reg_alpha` | 0-1 | L1 on weights | | `reg_lambda` | 0-1 | L2 on weights | | `gamma` | 0-1 | Min loss reduction for split | ## Workflow ```python import xgboost as xgb model = xgb.XGBClassifier( n_estimators=1000, learning_rate=0.05, max_depth=6, early_stopping_rounds=50, ) model.fit(X_train, y_train, eval_set=[(X_val, y_val)]) ``` The combination `early_stopping_rounds + eval_set` is standard practice: train many rounds; stop when validation stops improving. ## Strengths - **Production-grade.** Years of bug fixes, distributed training, GPU support, ONNX/PMML export. - **Strong defaults.** Even with light tuning, competitive results. - **Documentation and ecosystem.** Most tutorials and Kaggle kernels use it. ## When to Use - First reach for any tabular ML problem in 2026. - When you need the squeeze of every percentage point of accuracy. - When ensembling with Random Forest, LightGBM, CatBoost for the strongest possible tabular baseline. ## Related - [[Gradient Boosting Machines]] - [[Random Forest]] - [[Boosting]] - [[Decision Trees]]