## Definition
A **Large Language Model (LLM)** is a deep neural network — almost always a [[Transformer Architecture]] — trained on massive corpora of text to predict the next token in a sequence. "Large" typically denotes models with billions to trillions of parameters; the frontier of 2026 sits in the trillion-parameter range with dense or Mixture-of-Experts variants.
## What an LLM Fundamentally Does
> It estimates a probability distribution over the next token given the preceding context.
Everything else — chat, code completion, reasoning, tool use — is a structured exploitation of that single conditional distribution.
## The Three Phases
1. **[[Pretraining]]** — self-supervised next-token prediction on a vast text corpus.
2. **[[Fine-Tuning]]** — supervised or preference-based adjustment for instruction following, helpfulness, safety.
3. **Inference** — sampling tokens at runtime; see [[Sampling]] and [[Temperature]].
## Why "Large" Matters
LLM capabilities emerge non-linearly with scale (see [[Scaling Laws]]). Below a certain parameter and training-data threshold, behaviours like few-shot learning, instruction following, and [[Chain-of-Thought]] reasoning simply do not appear.
## What an LLM Is Not
- Not a search engine — it produces text plausibly, not necessarily truthfully (see [[Hallucination]]).
- Not a reasoner with persistent memory — see [[Context vs Memory]].
- Not deterministic by default — outputs vary with [[Sampling]] choices.
## Modern Frontier Examples (2026)
Claude 4.x, GPT-5, Gemini 3, Llama 4 family. All Transformer-based; most decoder-only.
## Related
- [[Foundation Model]]
- [[Transformer Architecture]]
- [[Pretraining]]
- [[Hallucination]]
- [[Attention Is All You Need (Vaswani et al.)]]