## Definition
**Dimensionality reduction** maps high-dimensional data $x \in \mathbb{R}^d$ to a lower-dimensional representation $z \in \mathbb{R}^k$ (with $k \ll d$) while preserving meaningful structure. Counters the [[Curse of Dimensionality]]; powers visualisation, compression, and preprocessing for downstream models.
## Two Goals
- **Linear vs Non-linear.** Linear methods (PCA) find a subspace; non-linear methods (t-SNE, UMAP, autoencoders) find a manifold.
- **Global vs Local structure.** PCA preserves global variance; t-SNE preserves local neighbourhoods; UMAP balances both.
## Methods
### Linear
- **[[Principal Component Analysis]] (PCA).** Project onto directions of maximum variance.
- **Linear Discriminant Analysis (LDA).** Supervised — find directions that maximise class separation.
- **Factor Analysis.** Models data as linear combinations of latent factors plus noise.
### Non-linear (manifold learning)
- **[[t-SNE]].** Preserves local structure; best for visualisation.
- **[[UMAP]].** Preserves both local and global; faster and more scalable than t-SNE.
- **Isomap.** Geodesic distances on a neighbourhood graph.
- **Locally Linear Embedding (LLE).** Each point reconstructed as weighted combination of neighbours.
### Neural
- **Autoencoders.** Encoder compresses; decoder reconstructs.
- **Variational Autoencoders.** Probabilistic version with regularised latent space.
## Use Cases
- **Visualisation.** Project to 2D or 3D for human inspection.
- **Compression.** Store the low-dimensional code instead of the original.
- **Preprocessing.** Reduce input dimensionality before training a downstream model.
- **Noise reduction.** The low-dim representation often captures signal more than noise.
## Pitfalls
- **Loss of interpretability.** The reduced dimensions rarely have meaningful semantics.
- **Distortion.** No method preserves all structure; trade-offs are explicit.
- **Hyperparameter sensitivity.** t-SNE's perplexity, UMAP's `n_neighbors` shape the result dramatically.
## Related
- [[Principal Component Analysis]]
- [[t-SNE]]
- [[UMAP]]
- [[Curse of Dimensionality]]
- [[Feature Engineering]]