## Definition
**Cross-validation (CV)** is a family of techniques that estimate a model's generalisation performance by repeatedly splitting the data into train/validation portions and averaging the results. The standard approach when a single hold-out split would waste too much data.
## Why Over a Single Split
A single 80/20 split estimates performance from 20% of the data — high variance. Cross-validation uses every example for validation *at some point* while still keeping training honest. Lower-variance estimates with the same data budget.
## Variants
### Hold-Out
Single split. Simple, low cost, high variance. Reasonable for very large datasets where 20% is still huge.
### [[K-Fold Cross-Validation]]
Partition into $k$ folds. Train on $k-1$, validate on the remaining one. Repeat $k$ times. Average. Standard choice with $k = 5$ or $k = 10$.
### Leave-One-Out (LOOCV)
Special case with $k = n$. Each example is its own validation fold. Almost unbiased but expensive ($n$ training runs) and surprisingly high variance.
### Stratified k-Fold
For classification, ensure each fold preserves the class proportions of the full dataset. Always prefer this for imbalanced classes.
### Time-Series CV
Each fold uses past data for training and future data for validation. Critical for sequential / temporal data where random splitting leaks the future.
### Group k-Fold
Each *group* (user, patient, session) appears in exactly one fold. Prevents within-group leakage.
## What CV Estimates
CV gives an estimate of the **expected performance** of the modelling procedure (training algorithm + hyperparameters) — not of a particular trained model. The model you ultimately deploy is trained on all the data, and its precise performance can differ from the CV estimate by a small amount.
## Cost
CV multiplies training time by the number of folds. For expensive models (deep networks), full $k$-fold CV may be impractical — use a fixed validation split instead.
## Common Pitfalls
- **Leaky preprocessing.** Fit the scaler/encoder on the *training fold only*, not the entire dataset.
- **Hyperparameter selection bias.** Picking the best CV score over many hyperparameter settings gives an optimistic estimate. Use nested CV or a separate test set.
- **Random splitting on grouped data** — see Group k-Fold.
## Related
- [[K-Fold Cross-Validation]]
- [[Train-Validation-Test Split]]
- [[Overfitting and Underfitting]]