## Definition A **token** is the atomic unit an LLM operates on — a sub-word fragment produced by a tokenizer that breaks text into a sequence of integers the model can attend to. Tokens, not characters or words, are the unit in which pricing, context limits, and latency are measured. ## Practical Sizing - English code typically tokenizes at **~4 characters per token**. - A 500-line TypeScript file ≈ **3,000–5,000 tokens**. - "Read this repo and propose a refactor" prompts regularly reach **50k–200k tokens**. - The chars-per-token ratio shifts with language, identifier style, and content type (SQL and YAML behave very differently from prose). ## Why Order Matters Because LLMs predict the next token conditional on all preceding ones, **token order shapes output**. Two consequences for the orchestrator: 1. *Recency wins.* Tokens at the end of the window influence the next output more than tokens at the start. 2. *Intent goes near the end.* "Here is the spec. Here is the code. Now do X." outperforms "Now do X. Here is the spec." ## Measuring Tokens ```python import anthropic client = anthropic.Anthropic() r = client.messages.count_tokens( model="claude-sonnet-4-6", messages=[{"role": "user", "content": "<text>"}], ) print(r.input_tokens) ``` ## Related - [[Context Window]] - [[Reasoning Budget]] - [[Lost in the Middle Effect]] - [[Prompt Caching]]