Three-Layer AI Stack - Albert Masoliver's learning site

## Definition The **three-layer AI stack** is Chip Huyen's framing of a foundation-model application as three concerns stacked top to bottom: *application development*, *model development*, and *infrastructure*. It appears in [[AI Engineering - Chip Huyen]] as a map for deciding where your effort belongs. ## The three layers | Layer | Concerns | Typical work | |---|---|---| | Application development | Prompts, context, evaluation | Prompt design, retrieval, eval harnesses | | Model development | Training, finetuning, inference optimization | Datasets, finetuning, quantization, serving | | Infrastructure | Compute, serving, storage | GPUs, schedulers, pipelines | The top layer is where you shape behaviour without touching weights: how you prompt, what context you assemble, how you measure quality. The middle layer changes the model itself. The bottom layer keeps it all running. ## Start at the top, move down as needed Huyen's operating advice is to "start at the top and move down as needed." Most application problems are solvable with better prompts, better context, and better evaluation — long before anyone needs to finetune, and longer still before anyone needs custom infrastructure. Reaching for the middle layer first is the expensive mistake: you finetune away a problem that a clearer spec or a different model would have dissolved. ## Where orchestrators live The [[Orchestrator Role]] lives almost entirely in the top layer. An orchestrator composes prompts, curates context, wires tools, and evaluates outputs — application development, not model development. They rarely train anything; their leverage is in *arrangement*, not in weights. Within that top layer, the first decision is which model to use at all. [[Model Selection Strategy]] precedes prompt tuning: the right model with a mediocre prompt often beats the wrong model with a perfect one. Pick the layer, then pick the model, then refine the prompt. ## Why the layering matters The stack is a triage tool. When quality is low, it tells you where to look *first*: 1. Is the prompt/context/eval right? (top — cheapest to fix) 2. Is the model itself wrong for the task? (selection, still top) 3. Does the model need finetuning? (middle — expensive) 4. Is serving the bottleneck? (bottom — different team) Most teams should exhaust the top layer before descending. The discipline of [[Spec-Driven Development]] is itself a top-layer practice: it improves outcomes by improving the specification you feed the model, not by changing the model. ## Related - [[Model Selection Strategy]] - [[Orchestrator Role]] - [[Spec-Driven Development]] - [[AI Engineering - Chip Huyen]]