## Definition
**Principal Component Analysis (PCA)** is the canonical linear [[Dimensionality Reduction]] technique. It finds orthogonal directions ("principal components") that capture maximum variance in the data, then projects the data onto the top $k$.
## Mathematical Formulation
Given centred data matrix $X \in \mathbb{R}^{n \times d}$:
1. Compute the covariance matrix $C = \frac{1}{n-1} X^\top X$.
2. Eigendecompose $C = V \Lambda V^\top$.
3. Sort eigenvalues $\lambda_i$ in decreasing order.
4. Take the top $k$ eigenvectors as columns of $W \in \mathbb{R}^{d \times k}$.
5. Project: $Z = X W$, with $Z \in \mathbb{R}^{n \times k}$.
Equivalently, use the SVD of $X$: $X = U \Sigma V^\top$. The columns of $V$ are the principal components.
## What It Optimises
PCA finds the $k$-dimensional subspace minimising squared reconstruction error:
$
\min_W \| X - X W W^\top \|_F^2 \quad \text{subject to} \quad W^\top W = I
$
Equivalently, maximising the variance of the projected data.
## Explained Variance
The $i$-th eigenvalue $\lambda_i$ equals the variance captured by the $i$-th component. **Explained variance ratio** is $\lambda_i / \sum_j \lambda_j$. Plotting cumulative explained variance against $k$ is the standard way to choose how many components to keep — typically 90-99%.
## Properties
- **Linear.** Cannot capture non-linear manifolds.
- **Variance-based.** A dimension dominant in variance might be dominated by *noise* if scales aren't normalised. Always standardise features first ([[Feature Scaling]]).
- **Unsupervised.** Doesn't use labels. PCA can drop directions that *separate classes* if those directions have low variance.
- **Reversible (with information loss).** Reconstruction $\hat X = Z W^\top$ approximates $X$.
## Variants
- **Kernel PCA.** Apply PCA in a feature space induced by a kernel — extends to non-linear data.
- **Sparse PCA.** Encourages sparse components for interpretability.
- **Incremental PCA.** Process data in mini-batches; useful when full data doesn't fit in memory.
- **Robust PCA.** Decomposes into low-rank + sparse components; useful with outliers.
## Common Uses
- **Visualisation** (project to 2-3 dimensions).
- **Compression** (store $k$-dimensional codes).
- **Preprocessing** for downstream models.
- **Noise removal** (keep top components only).
- **Eigenfaces** (faces from PCA of face images — historical computer vision).
## When PCA Fails
- **Strongly non-linear data.** Manifolds become tangled; use t-SNE / UMAP.
- **Labels matter.** PCA is unsupervised; LDA or supervised methods may serve better.
- **Components without scaling.** Failing to standardise gives the largest-scale feature undue weight.
## Related
- [[Dimensionality Reduction]]
- [[t-SNE]]
- [[UMAP]]
- [[Feature Scaling]]
- [[Embedding]]