## Definition **Temperature** is the scalar parameter $T$ that rescales the LLM's logits before the softmax that produces a token distribution. It controls the *sharpness* of the distribution and therefore the *diversity* of generated text. ## Mathematical Form Given logits $z$, the next-token probability is: $ P(x = v) = \frac{\exp(z_v / T)}{\sum_{v'} \exp(z_{v'} / T)} $ - **$T = 1$** — the softmax as the model produced it. Reference behaviour. - **$T \to 0$** — distribution collapses onto the argmax token. Effectively greedy decoding; deterministic given the prompt. - **$T \to \infty$** — distribution flattens toward uniform random. ## Practical Ranges | $T$ | Behaviour | When to use | | ----------- | -------------------------------------------- | ------------------------------------ | | 0.0 | Greedy; deterministic | Tests, batch jobs needing reproducibility | | 0.1–0.3 | Conservative; sticks to high-confidence tokens | Code generation; structured outputs | | 0.5–0.7 | Balanced | Default for many chat assistants | | 0.8–1.0 | Creative; explores rare tokens | Brainstorming, creative writing | | > 1.2 | Often incoherent | Research, deliberate exploration | ## Common Pitfalls - **Temperature 0 ≠ perfect determinism.** Floating-point non-associativity, batch effects, and load-balancing can change outputs slightly between runs. - **Lowering temperature ≠ reducing hallucinations.** A confident model can be confidently wrong; see [[Hallucination]]. Temperature shapes *which* errors you get, not their existence. - **Temperature interacts with top-p and top-k.** Combining all three can compound surprises; pick one main knob. ## When NOT to Touch It If you're iterating on a prompt and the model is producing inconsistent results, *first* check the prompt, *then* lower temperature. Temperature is a downstream symptom-shaper, not a prompt fix. ## Related - [[Sampling]] - [[Large Language Model]] - [[Hallucination]]