## Definition
**Hierarchical clustering** builds a tree of nested clusters (a **dendrogram**) by either merging or splitting clusters iteratively. Unlike [[K-Means Clustering]], it doesn't require choosing $k$ upfront.
## Two Approaches
### Agglomerative (Bottom-Up) — the standard
```
start: each data point is its own cluster
repeat:
merge the two closest clusters
until: one cluster remains
```
### Divisive (Top-Down)
```
start: all data in one cluster
repeat:
split the cluster with highest dissimilarity
until: each point is its own cluster
```
Agglomerative is much more common; divisive is computationally expensive.
## Linkage Criteria — How to Measure "Closest"
Different linkages give very different results:
- **Single linkage** (min): closest pair of points. Sensitive to noise (chains).
- **Complete linkage** (max): farthest pair. Compact, tight clusters.
- **Average linkage**: average pairwise distance.
- **Ward linkage**: minimises within-cluster variance increase. Most common default.
## The Dendrogram
A tree visualisation where the height of each merge represents the dissimilarity at which clusters joined. Cutting horizontally gives a clustering at a chosen granularity.
The dendrogram is the algorithm's defining strength — it reveals the *hierarchy* of clusters at every scale.
## Choosing $k$
After building the full dendrogram, choose $k$ by:
- **Cutting at a chosen height.**
- **Largest vertical jump** in the dendrogram.
- **Domain knowledge.**
## Strengths
- **No $k$ required upfront.**
- **Dendrogram reveals structure** at multiple scales.
- **Deterministic** (no random initialisation issues).
- **Works with any pairwise distance** — not limited to Euclidean.
## Weaknesses
- **Slow** — $O(n^2 \log n)$ with efficient implementations; $O(n^3)$ naively. Doesn't scale past ~10k samples.
- **Hierarchical decisions are permanent.** A bad merge early cannot be undone.
- **Memory-heavy.** Distance matrix is $O(n^2)$.
## When to Use
- Small datasets (< 10,000 points).
- Multi-scale analysis (you want to see clusters at multiple granularities).
- Pairwise distances available but features aren't.
- Phylogenetic-style analysis (biology, linguistics).
## Variants
- **HDBSCAN** — hierarchical density-based clustering; combines DBSCAN with hierarchy. Often the practical "best of both".
## Related
- [[K-Means Clustering]]
- [[DBSCAN]]
- [[Unsupervised Learning]]