## Definition
**t-SNE** (t-distributed Stochastic Neighbour Embedding) is a non-linear dimensionality-reduction method optimised for visualisation. Maps high-dimensional points to 2D or 3D while preserving local neighbourhood structure — points close in high-dim space stay close in low-dim space.
## How It Works (Intuitively)
1. In high-dimensional space, compute pairwise *similarity* between points using a Gaussian kernel. The bandwidth is adjusted per-point to give a fixed *perplexity* (effective number of neighbours).
2. In low-dimensional space, similarity is computed using a *Student-t* distribution with 1 degree of freedom — heavy-tailed.
3. Minimise the KL divergence between the high-dim and low-dim similarity distributions via gradient descent.
The heavy-tailed low-dim distribution gives t-SNE its "spreading" property — clusters become well-separated.
## Why Two Distributions
- High-dim Gaussian: emphasises local neighbours; large distances all look "far".
- Low-dim Student-t: heavy tails allow well-separated clusters to spread out further without crowding.
The mismatch is *deliberate* — it produces visually interpretable plots where similar points cluster and different points separate.
## Key Hyperparameters
- **Perplexity** (typical: 5-50). Loosely, the effective number of nearest neighbours considered. Low values emphasise tight local structure; high values blur into global structure. **Hyperparameter sensitivity is real** — different perplexities give very different plots.
- **Learning rate** (typical: 200, sometimes scaled with dataset size). Too low → stuck in local optima; too high → unstable.
- **Iterations** (typical: 1000+). t-SNE typically needs many iterations to converge.
## Practical Cautions
- **Cluster sizes are not informative.** Tightly-clustered groups may appear far apart; loose groups may appear close. The *separation* of clusters can be over-stated.
- **Distances between clusters are not meaningful.** t-SNE optimises local structure; global geometry is largely an artefact.
- **Different random seeds give different plots.** Multiple runs and visual confirmation are needed.
- **Don't use t-SNE coordinates as features for downstream models.** That's an abuse of the method; use PCA or UMAP-derived features instead.
## When to Use
- **Visualisation only** — 2D or 3D plots of high-dimensional data.
- **Exploratory analysis** — finding clusters, outliers, or substructure visually.
- **Sanity-checking embeddings** — does your learnt representation cluster meaningfully?
## When NOT to Use
- For downstream ML — use [[Principal Component Analysis|PCA]] or [[UMAP]].
- For very large datasets (>100k points) — speed becomes an issue. UMAP scales much better.
## Successor: UMAP
[[UMAP]] (2018) addresses t-SNE's two main weaknesses: speed and preservation of global structure. UMAP has largely displaced t-SNE in modern practice but t-SNE remains pedagogically important and well-supported.
## Related
- [[Dimensionality Reduction]]
- [[UMAP]]
- [[Principal Component Analysis]]
- [[Embedding]]