Generative AI - Albert Masoliver's learning site

## Definition **Generative AI** is the family of machine-learning systems that *produce* new content — text, images, audio, video, code, 3D — rather than classifying, ranking, or extracting from existing inputs. The defining property is that the output space is the same shape as the input space (or richer). ## Major Modalities (2026) - **Text.** LLMs — see [[Large Language Model]]. - **Image.** Diffusion models — see [[Diffusion Model]]. Stable Diffusion 4, Imagen 3, FLUX, Midjourney. - **Audio.** Speech synthesis (ElevenLabs, OpenAI TTS), music (Suno, Udio), sound effects. - **Video.** Sora 2, Veo 3, Runway Gen-4 — diffusion + transformer hybrids. - **Code.** A specialisation of text; same architecture, different training mix. - **Multimodal.** Models that mix several modalities natively — see [[Multimodal Model]]. - **3D and scene.** NeRFs, Gaussian splats, 3D-aware diffusion. ## Common Architectural Families - **Transformers** — dominate text, code, and increasingly other modalities. - **Diffusion models** — dominate image and audio; encroaching into video. - **Autoregressive image models** — older approach, mostly displaced by diffusion. - **GANs** — historically central for images; now mostly niche. - **Latent diffusion** — diffusion in a compressed latent space; the practical workhorse for high-res image generation. ## What Makes It "Generative" Two related properties: 1. **Sampling from a learned distribution** — see [[Sampling]]. 2. **Compositional output** — the model produces structure piece by piece (tokens, denoising steps), so each output is in principle novel. ## Why It Reshaped Software (2022+) The cost of producing a *first draft* of almost any creative or technical artifact dropped dramatically. The bottleneck shifted from generation to: - **Specification** — knowing what to ask for (see [[Spec-Driven Development]]). - **Verification** — knowing whether what was produced is correct. - **Curation** — choosing among many candidate outputs. These three are now the work — see [[Orchestrator Role]]. ## Related - [[Large Language Model]] - [[Diffusion Model]] - [[Multimodal Model]] - [[Foundation Model]]