Hierarchical Clustering - Albert Masoliver's learning site

## Definition **Hierarchical clustering** builds a tree of nested clusters (a **dendrogram**) by either merging or splitting clusters iteratively. Unlike [[K-Means Clustering]], it doesn't require choosing $k$ upfront. ## Two Approaches ### Agglomerative (Bottom-Up) — the standard ``` start: each data point is its own cluster repeat: merge the two closest clusters until: one cluster remains ``` ### Divisive (Top-Down) ``` start: all data in one cluster repeat: split the cluster with highest dissimilarity until: each point is its own cluster ``` Agglomerative is much more common; divisive is computationally expensive. ## Linkage Criteria — How to Measure "Closest" Different linkages give very different results: - **Single linkage** (min): closest pair of points. Sensitive to noise (chains). - **Complete linkage** (max): farthest pair. Compact, tight clusters. - **Average linkage**: average pairwise distance. - **Ward linkage**: minimises within-cluster variance increase. Most common default. ## The Dendrogram A tree visualisation where the height of each merge represents the dissimilarity at which clusters joined. Cutting horizontally gives a clustering at a chosen granularity. The dendrogram is the algorithm's defining strength — it reveals the *hierarchy* of clusters at every scale. ## Choosing $k$ After building the full dendrogram, choose $k$ by: - **Cutting at a chosen height.** - **Largest vertical jump** in the dendrogram. - **Domain knowledge.** ## Strengths - **No $k$ required upfront.** - **Dendrogram reveals structure** at multiple scales. - **Deterministic** (no random initialisation issues). - **Works with any pairwise distance** — not limited to Euclidean. ## Weaknesses - **Slow** — $O(n^2 \log n)$ with efficient implementations; $O(n^3)$ naively. Doesn't scale past ~10k samples. - **Hierarchical decisions are permanent.** A bad merge early cannot be undone. - **Memory-heavy.** Distance matrix is $O(n^2)$. ## When to Use - Small datasets (< 10,000 points). - Multi-scale analysis (you want to see clusters at multiple granularities). - Pairwise distances available but features aren't. - Phylogenetic-style analysis (biology, linguistics). ## Variants - **HDBSCAN** — hierarchical density-based clustering; combines DBSCAN with hierarchy. Often the practical "best of both". ## Related - [[K-Means Clustering]] - [[DBSCAN]] - [[Unsupervised Learning]]