## Definition
An **autoencoder** is a neural network trained to **reconstruct its own input**: an *encoder* compresses input into a lower-dimensional **latent** code; a *decoder* reconstructs the input from the code. Learns useful representations without labels.
## Architecture
$
z = \text{Encoder}(x), \quad \hat x = \text{Decoder}(z)
$
Loss: reconstruction error, typically MSE or binary cross-entropy:
$
\mathcal{L} = \|x - \hat x\|^2
$
The latent code $z$ is forced to capture the input's essential structure because the decoder must reconstruct from it alone.
## Why It Works
If the latent space is *smaller* than the input space, the autoencoder must compress — keeping the information most useful for reconstruction. This makes the code a learned representation that often transfers well to downstream tasks.
## Variants
### Vanilla Autoencoder
Symmetric encoder-decoder; bottleneck in the middle.
### Denoising Autoencoder
Train on corrupted inputs (add Gaussian noise, mask out parts); reconstruct the *clean* original. Forces the model to learn structure beyond memorisation. Vincent et al., 2008.
### Sparse Autoencoder
Add sparsity penalty on latent activations. Each input activates only a few latent dimensions. Learns more interpretable features.
### Contractive Autoencoder
Penalise the Jacobian of the encoder. Encourages robustness to input perturbations.
### Stacked Autoencoder
Multiple autoencoders trained layer by layer. Historical pre-training technique pre-2012.
### [[Variational Autoencoder]]
Probabilistic variant — encoder outputs a distribution, decoder samples from it. Enables generation.
## Common Uses
- **Dimensionality reduction.** Non-linear alternative to PCA.
- **Denoising.** Restore corrupted images.
- **Anomaly detection.** High reconstruction error → unusual input.
- **Pre-training.** Initialise networks with reconstruction; fine-tune for downstream task. Largely superseded by self-supervised pretraining (masked language modelling, contrastive learning).
- **Generative modelling.** With VAE specifically.
## Limitations
- **No probabilistic structure.** Vanilla autoencoders can't generate new samples meaningfully — latent space isn't structured for sampling.
- **Reconstruction ≠ understanding.** A perfect autoencoder doesn't necessarily understand the data; it might just memorise.
- **Lossy compression** trade-off — smaller bottlenecks lose more detail.
## Connection to Modern Models
- **Masked autoencoders (MAE)** for vision (He et al., 2022) — mask random patches and reconstruct. Modern pretraining at scale.
- **BERT** — a masked autoencoder for text (kind of).
- **Diffusion models** can be viewed as denoising autoencoders at multiple noise levels.
The autoencoder principle — reconstruct your input to learn — runs through much of modern self-supervised learning.
## Related
- [[Variational Autoencoder]]
- [[Generative Adversarial Network]]
- [[Principal Component Analysis]]
- [[Self-Supervised Learning]]