Autoencoder - Albert Masoliver's learning site

## Definition An **autoencoder** is a neural network trained to **reconstruct its own input**: an *encoder* compresses input into a lower-dimensional **latent** code; a *decoder* reconstructs the input from the code. Learns useful representations without labels. ## Architecture $ z = \text{Encoder}(x), \quad \hat x = \text{Decoder}(z) $ Loss: reconstruction error, typically MSE or binary cross-entropy: $ \mathcal{L} = \|x - \hat x\|^2 $ The latent code $z$ is forced to capture the input's essential structure because the decoder must reconstruct from it alone. ## Why It Works If the latent space is *smaller* than the input space, the autoencoder must compress — keeping the information most useful for reconstruction. This makes the code a learned representation that often transfers well to downstream tasks. ## Variants ### Vanilla Autoencoder Symmetric encoder-decoder; bottleneck in the middle. ### Denoising Autoencoder Train on corrupted inputs (add Gaussian noise, mask out parts); reconstruct the *clean* original. Forces the model to learn structure beyond memorisation. Vincent et al., 2008. ### Sparse Autoencoder Add sparsity penalty on latent activations. Each input activates only a few latent dimensions. Learns more interpretable features. ### Contractive Autoencoder Penalise the Jacobian of the encoder. Encourages robustness to input perturbations. ### Stacked Autoencoder Multiple autoencoders trained layer by layer. Historical pre-training technique pre-2012. ### [[Variational Autoencoder]] Probabilistic variant — encoder outputs a distribution, decoder samples from it. Enables generation. ## Common Uses - **Dimensionality reduction.** Non-linear alternative to PCA. - **Denoising.** Restore corrupted images. - **Anomaly detection.** High reconstruction error → unusual input. - **Pre-training.** Initialise networks with reconstruction; fine-tune for downstream task. Largely superseded by self-supervised pretraining (masked language modelling, contrastive learning). - **Generative modelling.** With VAE specifically. ## Limitations - **No probabilistic structure.** Vanilla autoencoders can't generate new samples meaningfully — latent space isn't structured for sampling. - **Reconstruction ≠ understanding.** A perfect autoencoder doesn't necessarily understand the data; it might just memorise. - **Lossy compression** trade-off — smaller bottlenecks lose more detail. ## Connection to Modern Models - **Masked autoencoders (MAE)** for vision (He et al., 2022) — mask random patches and reconstruct. Modern pretraining at scale. - **BERT** — a masked autoencoder for text (kind of). - **Diffusion models** can be viewed as denoising autoencoders at multiple noise levels. The autoencoder principle — reconstruct your input to learn — runs through much of modern self-supervised learning. ## Related - [[Variational Autoencoder]] - [[Generative Adversarial Network]] - [[Principal Component Analysis]] - [[Self-Supervised Learning]]