## Definition
A **decoding strategy** is the method that converts the model's next-token probability distribution into an actual chosen [[Token]]. The model gives you a distribution; the decoding strategy decides what to do with it — and it is your main lever for trading determinism against creativity.
## The main strategies
| Strategy | How it picks | Character |
| --- | --- | --- |
| **Greedy** (temp 0) | always the single highest-probability token | deterministic, repetitive |
| **Pure sampling** | draw from the full distribution | diverse, can wander |
| **Top-k** | sample only from the k most likely tokens | bounded randomness |
| **Top-p / nucleus** | sample from the smallest set whose mass ≥ p | adapts to how peaked the distribution is |
| **Beam search** | keep b best partial sequences | good for short, "correct-answer" outputs |
[[Temperature]] is the companion dial: it sharpens (low) or flattens (high) the distribution *before* the strategy samples from it. See [[Sampling]] and [[Logprobs]] for what is being sampled.
## The determinism-creativity lever
This is the practitioner's daily decision:
- **Low temperature / greedy** for code you'll diff, structured extraction, or anything where you want the same answer every time.
- **Higher temperature / nucleus** when you want the model to surface options — brainstorming, drafting, generating alternatives.
Match the strategy to the task, not to a default.
## "Deterministic" has limits
Even greedy decoding with a fixed seed is **not byte-for-byte reproducible** across runs. Floating-point non-associativity, GPU kernel scheduling, and batching mean the same logits can resolve slightly differently on different hardware. Plan evals and diffs around *approximate* stability, not exact reproduction.
## Related
- [[Sampling]]
- [[Temperature]]
- [[Logprobs]]
- [[Large Language Model]]
- [[Token]]
- [[Test-Time Compute]]
- [[Hands-On Large Language Models - Alammar, Grootendorst]]