Dimensionality Reduction - Albert Masoliver's learning site

## Definition **Dimensionality reduction** maps high-dimensional data $x \in \mathbb{R}^d$ to a lower-dimensional representation $z \in \mathbb{R}^k$ (with $k \ll d$) while preserving meaningful structure. Counters the [[Curse of Dimensionality]]; powers visualisation, compression, and preprocessing for downstream models. ## Two Goals - **Linear vs Non-linear.** Linear methods (PCA) find a subspace; non-linear methods (t-SNE, UMAP, autoencoders) find a manifold. - **Global vs Local structure.** PCA preserves global variance; t-SNE preserves local neighbourhoods; UMAP balances both. ## Methods ### Linear - **[[Principal Component Analysis]] (PCA).** Project onto directions of maximum variance. - **Linear Discriminant Analysis (LDA).** Supervised — find directions that maximise class separation. - **Factor Analysis.** Models data as linear combinations of latent factors plus noise. ### Non-linear (manifold learning) - **[[t-SNE]].** Preserves local structure; best for visualisation. - **[[UMAP]].** Preserves both local and global; faster and more scalable than t-SNE. - **Isomap.** Geodesic distances on a neighbourhood graph. - **Locally Linear Embedding (LLE).** Each point reconstructed as weighted combination of neighbours. ### Neural - **Autoencoders.** Encoder compresses; decoder reconstructs. - **Variational Autoencoders.** Probabilistic version with regularised latent space. ## Use Cases - **Visualisation.** Project to 2D or 3D for human inspection. - **Compression.** Store the low-dimensional code instead of the original. - **Preprocessing.** Reduce input dimensionality before training a downstream model. - **Noise reduction.** The low-dim representation often captures signal more than noise. ## Pitfalls - **Loss of interpretability.** The reduced dimensions rarely have meaningful semantics. - **Distortion.** No method preserves all structure; trade-offs are explicit. - **Hyperparameter sensitivity.** t-SNE's perplexity, UMAP's `n_neighbors` shape the result dramatically. ## Related - [[Principal Component Analysis]] - [[t-SNE]] - [[UMAP]] - [[Curse of Dimensionality]] - [[Feature Engineering]]