# AI Engineering: Building Applications with Foundation Models by [[Chip Huyen]] ## Summary <!-- a couple of paragraphs --> Chip Huyen's *AI Engineering* is a practitioner's guide to building production applications on top of foundation models rather than training them from scratch. It frames AI engineering as a distinct discipline from traditional ML engineering: instead of curating datasets and training models, the engineer composes prompts, retrieval, tools, and evaluation around pre-trained models accessed via APIs or open weights. The book moves systematically from how foundation models are trained and adapted, through prompt engineering, retrieval-augmented generation, fine-tuning, and inference optimization, to the architecture of full AI applications. The book is notable for its emphasis on evaluation and the economics of model selection — when to use a larger model, when a smaller one suffices, and how to measure quality on open-ended tasks. Later chapters treat agents as a first-class topic, covering tool use, planning, and the failure modes that emerge when models act in loops. It is grounded throughout in real deployment concerns: latency, cost, reliability, and the build-versus-buy decisions that shape the modern AI stack. ## Table of Contents - Ch. 1 — Introduction to Building AI Applications with Foundation Models - Ch. 2 — Understanding Foundation Models (training, scaling, post-training) - Ch. 3 — Evaluation Methodology - Ch. 4 — Evaluating AI Systems - Ch. 5 — Prompt Engineering - Ch. 6 — RAG and Agents (tool use, planning, failure modes) - Ch. 7 — Finetuning - Ch. 8 — Dataset Engineering - Ch. 9 — Inference Optimization - Ch. 10 — AI Engineering Architecture and User Feedback ## Notes <!-- main takeaways; LINK to the permanent notes this book grounds --> - Grounds the concept of the [[Foundation Model]] and how pre-training plus post-training ([[RLHF]]) produce general-purpose models. - Supports [[Scaling Laws]] — the relationship between compute, data, parameters, and capability. - Backs [[Sampling]] and [[Temperature]] as the levers controlling generation behavior. - Grounds [[Test-Time Compute]] and [[Mixture-of-Experts]] as efficiency and capability mechanisms. - Underpins the [[Three-Layer AI Stack]] and [[Model Selection Strategy]] (when to reach for a larger vs. smaller model). - Ch. 6 grounds the [[Agentic Loop]], [[Tool Use]], [[Agent Planning]], and [[Agent Failure Modes]]. - Ch. 4 grounds evaluation methodology for AI systems. ## Quotes - <!-- placeholder: add a verified short quote here --> ## Relevance to the course - Primary grounding for Module 1 (foundation model internals and selection) and Module 2 (agents, tool use, planning). Also supports Module 8 (evaluation and production architecture). --- ## References -