t-SNE - Albert Masoliver's learning site

## Definition **t-SNE** (t-distributed Stochastic Neighbour Embedding) is a non-linear dimensionality-reduction method optimised for visualisation. Maps high-dimensional points to 2D or 3D while preserving local neighbourhood structure — points close in high-dim space stay close in low-dim space. ## How It Works (Intuitively) 1. In high-dimensional space, compute pairwise *similarity* between points using a Gaussian kernel. The bandwidth is adjusted per-point to give a fixed *perplexity* (effective number of neighbours). 2. In low-dimensional space, similarity is computed using a *Student-t* distribution with 1 degree of freedom — heavy-tailed. 3. Minimise the KL divergence between the high-dim and low-dim similarity distributions via gradient descent. The heavy-tailed low-dim distribution gives t-SNE its "spreading" property — clusters become well-separated. ## Why Two Distributions - High-dim Gaussian: emphasises local neighbours; large distances all look "far". - Low-dim Student-t: heavy tails allow well-separated clusters to spread out further without crowding. The mismatch is *deliberate* — it produces visually interpretable plots where similar points cluster and different points separate. ## Key Hyperparameters - **Perplexity** (typical: 5-50). Loosely, the effective number of nearest neighbours considered. Low values emphasise tight local structure; high values blur into global structure. **Hyperparameter sensitivity is real** — different perplexities give very different plots. - **Learning rate** (typical: 200, sometimes scaled with dataset size). Too low → stuck in local optima; too high → unstable. - **Iterations** (typical: 1000+). t-SNE typically needs many iterations to converge. ## Practical Cautions - **Cluster sizes are not informative.** Tightly-clustered groups may appear far apart; loose groups may appear close. The *separation* of clusters can be over-stated. - **Distances between clusters are not meaningful.** t-SNE optimises local structure; global geometry is largely an artefact. - **Different random seeds give different plots.** Multiple runs and visual confirmation are needed. - **Don't use t-SNE coordinates as features for downstream models.** That's an abuse of the method; use PCA or UMAP-derived features instead. ## When to Use - **Visualisation only** — 2D or 3D plots of high-dimensional data. - **Exploratory analysis** — finding clusters, outliers, or substructure visually. - **Sanity-checking embeddings** — does your learnt representation cluster meaningfully? ## When NOT to Use - For downstream ML — use [[Principal Component Analysis|PCA]] or [[UMAP]]. - For very large datasets (>100k points) — speed becomes an issue. UMAP scales much better. ## Successor: UMAP [[UMAP]] (2018) addresses t-SNE's two main weaknesses: speed and preservation of global structure. UMAP has largely displaced t-SNE in modern practice but t-SNE remains pedagogically important and well-supported. ## Related - [[Dimensionality Reduction]] - [[UMAP]] - [[Principal Component Analysis]] - [[Embedding]]