## Definition
The **curse of dimensionality** refers to the various phenomena that arise when analysing data in high-dimensional spaces — phenomena that don't occur in the low-dimensional settings where intuition was formed. Bellman coined the term in 1957 (dynamic programming context).
## The Core Geometric Intuition
In high dimensions:
- **Volume concentrates near the boundary.** In a unit hypercube in $d$ dimensions, the fraction of volume within distance $\epsilon$ from the boundary approaches 1 as $d \to \infty$.
- **Distances concentrate.** Pairwise distances between random points become approximately equal — the relative gap between nearest and farthest neighbour shrinks. "Nearest neighbour" loses meaning.
- **Inner products concentrate around zero.** Random unit vectors are nearly orthogonal.
## Statistical Consequences
- **Sample density drops exponentially.** To cover the space at resolution $\epsilon$ requires $(1/\epsilon)^d$ samples. With 100 dimensions and modest resolution, the required sample count exceeds atoms in the universe.
- **Estimators become unreliable.** Density estimation, k-NN, kernel methods all degrade — they rely on local neighbourhoods that no longer exist.
- **Overfitting becomes easier.** More features = more ways to spuriously fit noise.
## Where the Curse Bites
- [[kNN]] degrades as dimensionality grows.
- Density estimation requires exponential data.
- Distance-based clustering becomes unreliable.
- Naive grid search over a high-dimensional parameter space is infeasible.
## Why Deep Learning Survives It
Deep networks operate in high-dimensional input spaces (images, text) successfully. They survive because real high-dimensional data is **not uniformly distributed** — it lies on lower-dimensional manifolds. A 256×256 RGB image has nominally $256^3 \cdot 256^2 \approx 10^{12}$ pixel values per image, but the manifold of *natural* images is vastly smaller. Deep networks discover and exploit this structure.
## Mitigations in Classical ML
- **[[Dimensionality Reduction]]** — explicitly project to a lower-dimensional space.
- **[[Feature Selection]]** — keep only relevant features.
- **Regularisation** — bias toward simpler models that don't use all dimensions equally.
- **Domain knowledge** — engineer features that capture the true low-dimensional structure.
## The Blessing of Dimensionality?
Recent theoretical work argues that *some* high-dimensional regimes are easier than low-dimensional ones — random projections preserve distances (Johnson-Lindenstrauss), concentration of measure gives strong guarantees. The curse isn't universal; it's the dominant phenomenon when data lacks structure.
## Related
- [[Dimensionality Reduction]]
- [[Feature Engineering]]
- [[kNN]]
- [[Principal Component Analysis]]