AI Engineering - Albert Masoliver's learning site

## Definition **AI engineering** is the discipline of building applications on top of readily available [[Foundation Model]]s, as opposed to training models from scratch. Chip Huyen (2024) characterises it as one of the fastest-growing engineering disciplines, enabled by three converging factors: (1) the general-purpose capabilities of foundation models, (2) a sharp increase in AI investment post-ChatGPT, and (3) a low barrier to entry via model-as-a-service APIs. The defining shift: where traditional ML engineering *produces* models, AI engineering *consumes* them and differentiates through adaptation and evaluation. ## How It Differs from ML Engineering Huyen identifies three structural differences: | Dimension | Traditional ML Engineering | AI Engineering | |---|---|---| | Model origin | Built from scratch | Pre-trained, sourced via API | | Core focus | Modeling and training | Model adaptation and evaluation | | Output type | Closed-ended (fixed classes) | Open-ended (free text) | | Evaluation difficulty | Ground-truth comparison | Requires richer rubrics | | Compute concern | Inference optimization | Even more so, at larger scale | Because outputs are open-ended, evaluation is *harder*, not easier — a chatbot response has no single ground truth. This makes evaluation a first-class engineering problem, not an afterthought. See [[Inference Latency]] for latency implications. ## Model Adaptation Techniques AI engineers adapt foundation models without training them from scratch. Huyen groups techniques by whether they update model weights: **Prompt-based (no weight updates)** - [[Prompt Engineering]] — giving the model instructions and context. - [[Retrieval-Augmented Generation]] — connecting the model to external knowledge. - Few-shot examples in the prompt (a form of [[In-Context Learning]]). **Weight-updating** - [[Fine-Tuning]] — further training on domain-specific or task-specific data; required when the task wasn't seen during pretraining or when strict output formats must be guaranteed. Huyen's heuristic: try prompt-based techniques first; fine-tune only when they plateau. ## The AI Engineering Stack The [[Three-Layer AI Stack]] breaks the process into three levels: application development (prompts, context, evaluation), model development (training, fine-tuning, inference optimisation), and infrastructure (serving, compute, monitoring). Most AI engineers operate primarily in the top layer. ## Application Landscape Common application patterns identified across 205 open-source repositories and 100+ enterprise case studies (Huyen, 2024): coding assistants, image and video production, writing aids, education tools, conversational bots, information aggregation, data organisation, and workflow automation. Internal-facing applications are deployed earlier than customer-facing ones due to lower compliance and risk thresholds. ## AI Product Defensibility Because foundation models are commodities and APIs lower barriers for all competitors, moats are narrow. Huyen identifies three sources of competitive advantage: - **Technology** — largely similar across teams using the same base models. - **Data** — proprietary usage data and domain-specific datasets. The *data flywheel* (usage → data → better model → more usage) is the durable moat. - **Distribution** — reach, established user bases; skews to large incumbents. ## Planning Considerations Huyen frames AI product planning around a "last mile challenge": demos are easy to build (days), production-quality products are hard (months to years). LinkedIn's experience: one month to reach 80% of desired quality, four more months to reach 95%. Each subsequent 1% gain is progressively more expensive — a pattern analogous to the broader [[Scaling Laws]] of model training. Application design dimensions (from Apple's framework, as cited by Huyen): - **Critical vs complementary** — how dependent is the app on AI? - **Reactive vs proactive** — does AI respond to requests or surface predictions opportunistically? - **Dynamic vs static** — is the model continuously updated per user, or periodically updated? ## Related - [[Foundation Model]] - [[Three-Layer AI Stack]] - [[Prompt Engineering]] - [[Fine-Tuning]] - [[Retrieval-Augmented Generation]] - [[In-Context Learning]] - [[Inference Latency]] - [[Hallucination]] - [[Scaling Laws]] ## Sources - [[AI Engineering - Chip Huyen]]