## Definition
A **token** is the atomic unit an LLM operates on — a sub-word fragment produced by a tokenizer that breaks text into a sequence of integers the model can attend to. Tokens, not characters or words, are the unit in which pricing, context limits, and latency are measured.
## Practical Sizing
- English code typically tokenizes at **~4 characters per token**.
- A 500-line TypeScript file ≈ **3,000–5,000 tokens**.
- "Read this repo and propose a refactor" prompts regularly reach **50k–200k tokens**.
- The chars-per-token ratio shifts with language, identifier style, and content type (SQL and YAML behave very differently from prose).
## Why Order Matters
Because LLMs predict the next token conditional on all preceding ones, **token order shapes output**. Two consequences for the orchestrator:
1. *Recency wins.* Tokens at the end of the window influence the next output more than tokens at the start.
2. *Intent goes near the end.* "Here is the spec. Here is the code. Now do X." outperforms "Now do X. Here is the spec."
## Measuring Tokens
```python
import anthropic
client = anthropic.Anthropic()
r = client.messages.count_tokens(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "<text>"}],
)
print(r.input_tokens)
```
## Related
- [[Context Window]]
- [[Reasoning Budget]]
- [[Lost in the Middle Effect]]
- [[Prompt Caching]]