## Definition
A **parameter** is a single learned weight inside a neural network, adjusted during [[Pretraining]] to minimize prediction error. The count of parameters is the headline number on a model's spec sheet and a rough proxy for its raw capacity.
## The scale-up
Parameter counts exploded across the GPT line, and the jumps were order-of-magnitude:
| Model | Parameters |
| --- | --- |
| GPT-1 | 117M |
| GPT-2 | 1.5B |
| GPT-3 | 175B |
Each generation roughly multiplied capacity, and the leap in capability that came with it is what kicked off the modern [[Large Language Model]] era.
## More is not free
A larger parameter count only helps when it is *matched by enough training data*. Hoffmann et al. (DeepMind, 2022) — the Chinchilla paper — showed that the GPT-3-era giants were badly *undertrained*: for compute-optimal training you want roughly **20 tokens per parameter**. Pour parameters in without the data to feed them and you waste capacity. This is the heart of [[Scaling Laws]].
## Bigger is not better
Because data quality, training recipe, and architecture improve over time, a newer small model routinely beats an older large one. A well-trained **Llama 3-8B** outperforms the older **Llama 2-70B** on most benchmarks despite having under a sixth of the parameters. Treat the parameter count as one input, never the verdict — see the [[Model Card]] for what actually matters.
## You pick, you don't tune
As a practitioner you almost never touch individual parameters. You *select* a pre-trained model whose parameters are already frozen, then steer it with prompts, [[Fine-Tuning]], or retrieval. The weights are the vendor's artifact; your job is orchestration.
## Active vs total
The single number is also getting slippery. In a [[Mixture-of-Experts]] model, only a fraction of parameters fire per [[Token]], so "total parameters" and "active parameters" diverge — and only the active count drives cost and speed.
## Related
- [[Scaling Laws]]
- [[Mixture-of-Experts]]
- [[Model Card]]
- [[Pretraining]]
- [[Large Language Model]]
- [[Foundation Model]]
- [[AI Engineering - Chip Huyen]]