In-Context Learning - Albert Masoliver's learning site

## Definition **In-context learning (ICL)** is the ability of large LLMs to perform a new task by being shown a handful of examples *in the prompt* — no weight updates, no fine-tuning. Named and characterised in the GPT-3 paper (Brown et al., 2020); a defining emergent capability of frontier models. ## The Shapes ### Zero-shot The model is given only the task description. ``` Classify this review as positive or negative. Review: "The interface is clean and fast." Classification: ``` ### Few-shot (n-shot) The prompt includes $n$ example input-output pairs followed by the new input. ``` Classify each review as positive or negative. Review: "Loved the colour." → positive Review: "Crashed on startup." → negative Review: "Service was fine." → neutral Review: "The interface is clean and fast." → ``` Few-shot dramatically improves performance on tasks with non-obvious format requirements. ## Why It Works (Roughly) The model treats the examples as evidence about the latent task and pattern-matches the new input against them. The mechanism is not "learning" in the gradient-update sense; it's *conditioning*. Mechanistic interpretability research has found internal structures that approximate small algorithms induced from the examples. ## Practical Guidance - **3–5 examples** usually plateau; more rarely helps and may invoke [[Lost in the Middle Effect]]. - **Diverse examples** cover edge cases better than near-duplicates. - **Example order matters.** Recent examples tend to influence more. - **Include negative examples** if the task has a tricky failure mode. ## ICL vs Fine-Tuning | Property | In-context learning | Fine-tuning | | ---------------- | ------------------- | ---------------------- | | Cost to set up | Minutes | Hours to days | | Token cost | Pay per request | Pay once, save per use | | Updates the model? | No | Yes (or adapters) | | Best for | One-off, prototypes | Repeated production | The decision rule: prototype with ICL, fine-tune only when ICL plateaus and the volume justifies the engineering investment. ## Failure Modes - **Format leakage.** The model emits the example labels instead of new ones. - **Anchoring.** First example unduly influences subsequent classifications. - **Insufficient signal.** Examples too similar; model can't induce the underlying rule. ## Related - [[Prompt Engineering]] - [[Chain-of-Thought]] - [[Fine-Tuning]] - [[Large Language Model]]