# AI Engineering: Building Applications with Foundation Models
by [[Chip Huyen]]
## Summary
<!-- a couple of paragraphs -->
Chip Huyen's *AI Engineering* is a practitioner's guide to building production applications on top of foundation models rather than training them from scratch. It frames AI engineering as a distinct discipline from traditional ML engineering: instead of curating datasets and training models, the engineer composes prompts, retrieval, tools, and evaluation around pre-trained models accessed via APIs or open weights. The book moves systematically from how foundation models are trained and adapted, through prompt engineering, retrieval-augmented generation, fine-tuning, and inference optimization, to the architecture of full AI applications.
The book is notable for its emphasis on evaluation and the economics of model selection — when to use a larger model, when a smaller one suffices, and how to measure quality on open-ended tasks. Later chapters treat agents as a first-class topic, covering tool use, planning, and the failure modes that emerge when models act in loops. It is grounded throughout in real deployment concerns: latency, cost, reliability, and the build-versus-buy decisions that shape the modern AI stack.
## Table of Contents
- Ch. 1 — Introduction to Building AI Applications with Foundation Models
- Ch. 2 — Understanding Foundation Models (training, scaling, post-training)
- Ch. 3 — Evaluation Methodology
- Ch. 4 — Evaluating AI Systems
- Ch. 5 — Prompt Engineering
- Ch. 6 — RAG and Agents (tool use, planning, failure modes)
- Ch. 7 — Finetuning
- Ch. 8 — Dataset Engineering
- Ch. 9 — Inference Optimization
- Ch. 10 — AI Engineering Architecture and User Feedback
## Notes
<!-- main takeaways; LINK to the permanent notes this book grounds -->
- Grounds the concept of the [[Foundation Model]] and how pre-training plus post-training ([[RLHF]]) produce general-purpose models.
- Supports [[Scaling Laws]] — the relationship between compute, data, parameters, and capability.
- Backs [[Sampling]] and [[Temperature]] as the levers controlling generation behavior.
- Grounds [[Test-Time Compute]] and [[Mixture-of-Experts]] as efficiency and capability mechanisms.
- Underpins the [[Three-Layer AI Stack]] and [[Model Selection Strategy]] (when to reach for a larger vs. smaller model).
- Ch. 6 grounds the [[Agentic Loop]], [[Tool Use]], [[Agent Planning]], and [[Agent Failure Modes]].
- Ch. 4 grounds evaluation methodology for AI systems.
## Quotes
- <!-- placeholder: add a verified short quote here -->
## Relevance to the course
- Primary grounding for Module 1 (foundation model internals and selection) and Module 2 (agents, tool use, planning). Also supports Module 8 (evaluation and production architecture).
---
## References
-