Hallucination - Albert Masoliver's learning site

## Definition A **hallucination** is an LLM output that is plausibly phrased but factually wrong, fabricated, or unsupported by the input or any verifiable source. The term is contested (the model isn't perceiving anything) but has become the field's standard label for this failure mode. ## Common Shapes - **Fabricated facts.** Confidently states something untrue. - **Made-up citations.** Generates plausible-looking paper titles, URLs, authors that don't exist. - **Phantom code.** Calls library functions that aren't in the actual API. - **Confused attribution.** Mixes up which paper said what, which version introduced a feature, which company owns a product. ## Why LLMs Hallucinate 1. **Objective mismatch.** Training optimises for next-token plausibility, not truth. A well-phrased false answer scores well. 2. **Pretraining noise.** The corpus contains errors, contradictions, and outdated content. 3. **Sampling stochasticity.** [[Sampling]] selects from a distribution; some plausible-but-wrong tokens are chosen. 4. **Long-context confusion.** Information at the wrong position can be missed or mis-grounded (see [[Lost in the Middle Effect]]). ## Mitigations - **Ground in retrieved sources.** [[Retrieval-Augmented Generation]] makes the model condition on real text and cite it. - **Tool use for facts.** Look up numbers, dates, and APIs via [[Tool Use]] rather than recalling them. - **Lower temperature.** Reduces — but does **not** eliminate — fabrication; see [[Temperature]] caveat. - **Demand citations.** Prompt the model to cite its sources; verify the citations exist. - **Verifier agents.** A separate agent reviewing claims against retrieved evidence (see [[Verifier Independence]]). ## Why It Matters Operationally In agentic software engineering specifically, hallucinated code is particularly insidious — it *parses*, sometimes *compiles*, and may pass shallow tests. Catching it requires running the tests against the real APIs, not the model's belief about them. ## What Hallucination Is Not - *Output you disagree with* — that's disagreement, not hallucination. - *Sycophancy* — agreeing with users when wrong; a related but distinct alignment failure. - *Confabulation about future events* — that's just operating outside the model's training cutoff. ## Related - [[Retrieval-Augmented Generation]] - [[Tool Use]] - [[Temperature]] - [[Lost in the Middle Effect]] - [[Alignment]]