## Definition The **curse of dimensionality** refers to the various phenomena that arise when analysing data in high-dimensional spaces — phenomena that don't occur in the low-dimensional settings where intuition was formed. Bellman coined the term in 1957 (dynamic programming context). ## The Core Geometric Intuition In high dimensions: - **Volume concentrates near the boundary.** In a unit hypercube in $d$ dimensions, the fraction of volume within distance $\epsilon$ from the boundary approaches 1 as $d \to \infty$. - **Distances concentrate.** Pairwise distances between random points become approximately equal — the relative gap between nearest and farthest neighbour shrinks. "Nearest neighbour" loses meaning. - **Inner products concentrate around zero.** Random unit vectors are nearly orthogonal. ## Statistical Consequences - **Sample density drops exponentially.** To cover the space at resolution $\epsilon$ requires $(1/\epsilon)^d$ samples. With 100 dimensions and modest resolution, the required sample count exceeds atoms in the universe. - **Estimators become unreliable.** Density estimation, k-NN, kernel methods all degrade — they rely on local neighbourhoods that no longer exist. - **Overfitting becomes easier.** More features = more ways to spuriously fit noise. ## Where the Curse Bites - [[kNN]] degrades as dimensionality grows. - Density estimation requires exponential data. - Distance-based clustering becomes unreliable. - Naive grid search over a high-dimensional parameter space is infeasible. ## Why Deep Learning Survives It Deep networks operate in high-dimensional input spaces (images, text) successfully. They survive because real high-dimensional data is **not uniformly distributed** — it lies on lower-dimensional manifolds. A 256×256 RGB image has nominally $256^3 \cdot 256^2 \approx 10^{12}$ pixel values per image, but the manifold of *natural* images is vastly smaller. Deep networks discover and exploit this structure. ## Mitigations in Classical ML - **[[Dimensionality Reduction]]** — explicitly project to a lower-dimensional space. - **[[Feature Selection]]** — keep only relevant features. - **Regularisation** — bias toward simpler models that don't use all dimensions equally. - **Domain knowledge** — engineer features that capture the true low-dimensional structure. ## The Blessing of Dimensionality? Recent theoretical work argues that *some* high-dimensional regimes are easier than low-dimensional ones — random projections preserve distances (Johnson-Lindenstrauss), concentration of measure gives strong guarantees. The curse isn't universal; it's the dominant phenomenon when data lacks structure. ## Related - [[Dimensionality Reduction]] - [[Feature Engineering]] - [[kNN]] - [[Principal Component Analysis]]