## Definition
A **Generative Adversarial Network (GAN)** (Goodfellow et al., 2014) pits two neural networks against each other: a **generator** that creates fake samples and a **discriminator** that tries to distinguish real from fake. Through this adversarial game, the generator learns to produce data indistinguishable from the training distribution.
## The Minimax Game
$
\min_G \max_D \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p_z}[\log(1 - D(G(z)))]
$
- $D$ — discriminator, tries to maximise this objective (correctly classify real and fake).
- $G$ — generator, tries to minimise it (fool the discriminator).
At equilibrium, $G$ generates from the true data distribution; $D$ is at chance (50% accuracy).
## Training Loop
```
for each iteration:
sample real x from data
sample noise z ~ N(0, I)
generate fake samples G(z)
update D by maximising the objective (gradient ascent)
update G by minimising it (gradient descent)
```
In practice, update D for $k$ steps then G for one step. Many variants.
## Why GANs Mattered
- **Sharpness.** Unlike VAEs (blurry), GANs produced strikingly sharp images.
- **No likelihood model required.** GAN doesn't need to evaluate $p(x)$; it only needs to *sample* from it.
- **General framework.** The adversarial training idea generalised to many domains.
## Famous Variants
### DCGAN (Radford et al., 2015)
Convolutional GAN for image generation. Specific architectural recommendations (transposed convs, BatchNorm) made GAN training reproducible.
### Conditional GAN (CGAN)
Condition both generator and discriminator on side information (class label, text). Enables class-conditional generation.
### CycleGAN (Zhu et al., 2017)
Unpaired image-to-image translation (horses ↔ zebras, summer ↔ winter). Cycle-consistency loss ensures translations are reversible.
### StyleGAN (Karras et al., 2018+)
State-of-the-art face generation with progressive growth and style-based generator. Produced uncannily realistic human faces.
### Wasserstein GAN (Arjovsky et al., 2017)
Replaces the original GAN objective with Wasserstein distance. More stable training; less mode collapse.
## The Training Difficulties
GANs are notoriously hard to train:
- **Mode collapse.** Generator produces only a few types of samples; ignores other modes of the data distribution.
- **Non-convergence.** The minimax objective oscillates without converging.
- **Discriminator dominance.** If D becomes too good too fast, G's gradients vanish.
- **Hyperparameter sensitivity.** Small changes can break training.
Many fixes proposed (WGAN, spectral normalisation, gradient penalty, two-time-scale updates) but training remained difficult.
## Modern Status
GANs were dominant in image generation ~2014-2020. Diffusion models (~2020+) largely displaced them because:
- **Diffusion is easier to train** — no adversarial dynamics.
- **Diffusion provides more diverse samples** — no mode collapse.
- **Diffusion matches or exceeds GAN quality** at scale.
GANs remain useful for:
- **Real-time generation** — single forward pass vs diffusion's many denoising steps.
- **Specific niches** where established GAN methods work well (style transfer, super-resolution).
## Related
- [[Variational Autoencoder]]
- [[Diffusion Model]]
- [[Autoencoder]]
- [[Generative AI]]