## Definition
**Temperature** is the scalar parameter $T$ that rescales the LLM's logits before the softmax that produces a token distribution. It controls the *sharpness* of the distribution and therefore the *diversity* of generated text.
## Mathematical Form
Given logits $z$, the next-token probability is:
$
P(x = v) = \frac{\exp(z_v / T)}{\sum_{v'} \exp(z_{v'} / T)}
$
- **$T = 1$** — the softmax as the model produced it. Reference behaviour.
- **$T \to 0$** — distribution collapses onto the argmax token. Effectively greedy decoding; deterministic given the prompt.
- **$T \to \infty$** — distribution flattens toward uniform random.
## Practical Ranges
| $T$ | Behaviour | When to use |
| ----------- | -------------------------------------------- | ------------------------------------ |
| 0.0 | Greedy; deterministic | Tests, batch jobs needing reproducibility |
| 0.1–0.3 | Conservative; sticks to high-confidence tokens | Code generation; structured outputs |
| 0.5–0.7 | Balanced | Default for many chat assistants |
| 0.8–1.0 | Creative; explores rare tokens | Brainstorming, creative writing |
| > 1.2 | Often incoherent | Research, deliberate exploration |
## Common Pitfalls
- **Temperature 0 ≠ perfect determinism.** Floating-point non-associativity, batch effects, and load-balancing can change outputs slightly between runs.
- **Lowering temperature ≠ reducing hallucinations.** A confident model can be confidently wrong; see [[Hallucination]]. Temperature shapes *which* errors you get, not their existence.
- **Temperature interacts with top-p and top-k.** Combining all three can compound surprises; pick one main knob.
## When NOT to Touch It
If you're iterating on a prompt and the model is producing inconsistent results, *first* check the prompt, *then* lower temperature. Temperature is a downstream symptom-shaper, not a prompt fix.
## Related
- [[Sampling]]
- [[Large Language Model]]
- [[Hallucination]]