## Definition
**In-context learning (ICL)** is the ability of large LLMs to perform a new task by being shown a handful of examples *in the prompt* — no weight updates, no fine-tuning. Named and characterised in the GPT-3 paper (Brown et al., 2020); a defining emergent capability of frontier models.
## The Shapes
### Zero-shot
The model is given only the task description.
```
Classify this review as positive or negative.
Review: "The interface is clean and fast."
Classification:
```
### Few-shot (n-shot)
The prompt includes $n$ example input-output pairs followed by the new input.
```
Classify each review as positive or negative.
Review: "Loved the colour." → positive
Review: "Crashed on startup." → negative
Review: "Service was fine." → neutral
Review: "The interface is clean and fast." →
```
Few-shot dramatically improves performance on tasks with non-obvious format requirements.
## Why It Works (Roughly)
The model treats the examples as evidence about the latent task and pattern-matches the new input against them. The mechanism is not "learning" in the gradient-update sense; it's *conditioning*. Mechanistic interpretability research has found internal structures that approximate small algorithms induced from the examples.
## Practical Guidance
- **3–5 examples** usually plateau; more rarely helps and may invoke [[Lost in the Middle Effect]].
- **Diverse examples** cover edge cases better than near-duplicates.
- **Example order matters.** Recent examples tend to influence more.
- **Include negative examples** if the task has a tricky failure mode.
## ICL vs Fine-Tuning
| Property | In-context learning | Fine-tuning |
| ---------------- | ------------------- | ---------------------- |
| Cost to set up | Minutes | Hours to days |
| Token cost | Pay per request | Pay once, save per use |
| Updates the model? | No | Yes (or adapters) |
| Best for | One-off, prototypes | Repeated production |
The decision rule: prototype with ICL, fine-tune only when ICL plateaus and the volume justifies the engineering investment.
## Failure Modes
- **Format leakage.** The model emits the example labels instead of new ones.
- **Anchoring.** First example unduly influences subsequent classifications.
- **Insufficient signal.** Examples too similar; model can't induce the underlying rule.
## Related
- [[Prompt Engineering]]
- [[Chain-of-Thought]]
- [[Fine-Tuning]]
- [[Large Language Model]]