Foundation Model - Albert Masoliver's learning site

## Definition A **foundation model** is a large model trained on broad data at scale, capable of being adapted to many downstream tasks. The term was coined by the Stanford CRFM in 2021. LLMs are the most visible class of foundation model; image and multimodal foundation models exist too. ## Defining Properties 1. **Trained at scale.** Both in parameters and in training-data breadth. 2. **General-purpose.** Not specialised for one task at training time. 3. **Adaptable.** Downstream tasks reached via prompting, fine-tuning, or retrieval — not retraining from scratch. ## Vs. Task-Specific Models The classical NLP recipe was *task-specific*: one model per task, trained on labelled data. Foundation models invert this — one base model serves dozens of tasks via prompting. The economic and engineering implications are still being absorbed. ## Examples (2026) - **Text:** Claude 4.x, GPT-5, Gemini 3, Llama 4, Qwen 3, Mistral families. - **Image:** Stable Diffusion 4, Imagen 3, FLUX. - **Multimodal:** Claude (vision), GPT-5 (vision + audio), Gemini 3 (native multimodal). - **Code-specific** (where they still exist): Codestral, CodeQwen — though general-purpose models dominate. ## Adaptation Pathways 1. **Prompting.** Zero-shot or few-shot — see [[In-Context Learning]]. 2. **Retrieval augmentation.** See [[Retrieval-Augmented Generation]]. 3. **Fine-tuning.** See [[Fine-Tuning]]. 4. **Tool use.** See [[Tool Use]]. ## Centralisation Concern A handful of foundation models underpin a vast share of AI applications. A single training-data or alignment choice in one model propagates to thousands of downstream products. The orchestrator's job includes choosing — and not over-trusting — which foundation model to build on. ## Related - [[Large Language Model]] - [[Pretraining]] - [[Multimodal Model]] - [[Scaling Laws]]