## Definition A **Large Language Model (LLM)** is a deep neural network — almost always a [[Transformer Architecture]] — trained on massive corpora of text to predict the next token in a sequence. "Large" typically denotes models with billions to trillions of parameters; the frontier of 2026 sits in the trillion-parameter range with dense or Mixture-of-Experts variants. ## What an LLM Fundamentally Does > It estimates a probability distribution over the next token given the preceding context. Everything else — chat, code completion, reasoning, tool use — is a structured exploitation of that single conditional distribution. ## The Three Phases 1. **[[Pretraining]]** — self-supervised next-token prediction on a vast text corpus. 2. **[[Fine-Tuning]]** — supervised or preference-based adjustment for instruction following, helpfulness, safety. 3. **Inference** — sampling tokens at runtime; see [[Sampling]] and [[Temperature]]. ## Why "Large" Matters LLM capabilities emerge non-linearly with scale (see [[Scaling Laws]]). Below a certain parameter and training-data threshold, behaviours like few-shot learning, instruction following, and [[Chain-of-Thought]] reasoning simply do not appear. ## What an LLM Is Not - Not a search engine — it produces text plausibly, not necessarily truthfully (see [[Hallucination]]). - Not a reasoner with persistent memory — see [[Context vs Memory]]. - Not deterministic by default — outputs vary with [[Sampling]] choices. ## Modern Frontier Examples (2026) Claude 4.x, GPT-5, Gemini 3, Llama 4 family. All Transformer-based; most decoder-only. ## Related - [[Foundation Model]] - [[Transformer Architecture]] - [[Pretraining]] - [[Hallucination]] - [[Attention Is All You Need (Vaswani et al.)]]