Generative Adversarial Network - Albert Masoliver's learning site

## Definition A **Generative Adversarial Network (GAN)** (Goodfellow et al., 2014) pits two neural networks against each other: a **generator** that creates fake samples and a **discriminator** that tries to distinguish real from fake. Through this adversarial game, the generator learns to produce data indistinguishable from the training distribution. ## The Minimax Game $ \min_G \max_D \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p_z}[\log(1 - D(G(z)))] $ - $D$ — discriminator, tries to maximise this objective (correctly classify real and fake). - $G$ — generator, tries to minimise it (fool the discriminator). At equilibrium, $G$ generates from the true data distribution; $D$ is at chance (50% accuracy). ## Training Loop ``` for each iteration: sample real x from data sample noise z ~ N(0, I) generate fake samples G(z) update D by maximising the objective (gradient ascent) update G by minimising it (gradient descent) ``` In practice, update D for $k$ steps then G for one step. Many variants. ## Why GANs Mattered - **Sharpness.** Unlike VAEs (blurry), GANs produced strikingly sharp images. - **No likelihood model required.** GAN doesn't need to evaluate $p(x)$; it only needs to *sample* from it. - **General framework.** The adversarial training idea generalised to many domains. ## Famous Variants ### DCGAN (Radford et al., 2015) Convolutional GAN for image generation. Specific architectural recommendations (transposed convs, BatchNorm) made GAN training reproducible. ### Conditional GAN (CGAN) Condition both generator and discriminator on side information (class label, text). Enables class-conditional generation. ### CycleGAN (Zhu et al., 2017) Unpaired image-to-image translation (horses ↔ zebras, summer ↔ winter). Cycle-consistency loss ensures translations are reversible. ### StyleGAN (Karras et al., 2018+) State-of-the-art face generation with progressive growth and style-based generator. Produced uncannily realistic human faces. ### Wasserstein GAN (Arjovsky et al., 2017) Replaces the original GAN objective with Wasserstein distance. More stable training; less mode collapse. ## The Training Difficulties GANs are notoriously hard to train: - **Mode collapse.** Generator produces only a few types of samples; ignores other modes of the data distribution. - **Non-convergence.** The minimax objective oscillates without converging. - **Discriminator dominance.** If D becomes too good too fast, G's gradients vanish. - **Hyperparameter sensitivity.** Small changes can break training. Many fixes proposed (WGAN, spectral normalisation, gradient penalty, two-time-scale updates) but training remained difficult. ## Modern Status GANs were dominant in image generation ~2014-2020. Diffusion models (~2020+) largely displaced them because: - **Diffusion is easier to train** — no adversarial dynamics. - **Diffusion provides more diverse samples** — no mode collapse. - **Diffusion matches or exceeds GAN quality** at scale. GANs remain useful for: - **Real-time generation** — single forward pass vs diffusion's many denoising steps. - **Specific niches** where established GAN methods work well (style transfer, super-resolution). ## Related - [[Variational Autoencoder]] - [[Diffusion Model]] - [[Autoencoder]] - [[Generative AI]]