## Definition
**K-Means** partitions data into $k$ disjoint clusters by iteratively assigning each point to its nearest centroid and updating centroids as cluster means. The canonical clustering algorithm.
## Algorithm (Lloyd's Algorithm)
```
initialise k centroids (typically by random selection from data)
repeat until convergence:
Assignment step: each point → nearest centroid
Update step: each centroid → mean of its assigned points
```
Converges in finite iterations to a local optimum of the sum-of-squared-distances objective:
$
J = \sum_{c=1}^k \sum_{x_i \in C_c} \|x_i - \mu_c\|^2
$
## Choosing $k$
- **Elbow method.** Plot $J$ vs $k$; pick the "elbow" where diminishing returns set in.
- **Silhouette score.** Measures how well points fit their cluster vs neighbouring clusters.
- **Gap statistic.** Compares $J$ to a null reference distribution.
- **Domain knowledge.** Sometimes $k$ is given.
## Initialisation Matters
Random initialisation can lead to bad local optima. **K-Means++** (Arthur & Vassilvitskii, 2007) seeds smartly: pick first centroid randomly; pick subsequent centroids with probability proportional to squared distance from existing centroids. Now the default in scikit-learn.
Run from multiple seeds and keep the best — `n_init` in scikit-learn (default 10).
## Strengths
- **Fast** — $O(n \cdot k \cdot d \cdot \text{iterations})$.
- **Simple and interpretable.**
- **Scales** to large data with mini-batch variants.
## Weaknesses
- **$k$ must be specified.** Often the hardest decision.
- **Assumes spherical, equal-size clusters.** Elongated or unequal-density clusters break it.
- **Sensitive to scale.** Always standardise features.
- **Hard assignments.** No notion of "this point is between two clusters" — use [[Gaussian Mixture Model]] for soft assignments.
- **Local optima.** Multiple runs help.
## Variants
- **Mini-batch K-Means.** Use small batches for speed; for very large data.
- **K-Medoids (PAM).** Use actual data points as cluster centres; more robust to outliers.
- **Bisecting K-Means.** Hierarchical via repeated 2-means.
- **Kernel K-Means.** K-Means in a kernel feature space.
## When K-Means Fails
For non-convex clusters (concentric rings, half-moons), K-Means produces nonsense. Use [[DBSCAN]] or [[Hierarchical Clustering]] instead.
## Related
- [[Hierarchical Clustering]]
- [[DBSCAN]]
- [[Gaussian Mixture Model]]
- [[Expectation-Maximization]]
- [[Unsupervised Learning]]