## Definition The **context window** is the maximum number of tokens an LLM can attend to in a single turn. It is the model's *active attention surface*, not its memory. In 2026 the frontier baseline is ~200k tokens, with 1M-token tiers available for select workloads. ## What Degrades as the Window Fills 1. **Recall accuracy.** Information dropped in the middle of a long prompt is less likely to be used (the [[Lost in the Middle Effect]]). 2. **Instruction adherence.** Early instructions ("always use TypeScript strict mode") get drowned by recent intermediate output. ## Common Misconceptions - *Context window is memory.* It is not. It is attention surface for **one turn**. Anything that must persist between sessions belongs in a file or a memory store — see [[Context vs Memory]]. - *Bigger is always better.* A model with 30k tokens of relevant context outperforms one with 30k relevant + 100k of "just in case". Every irrelevant token is a vote against the right answer. ## Operational Rules - Curate the working set actively per task. - When approaching the limit, externalise state to disk rather than waiting for [[Context Compaction]]. - Use sub-agents to keep verbose tool output out of the main thread. ## Related - [[Token]] - [[Lost in the Middle Effect]] - [[Context Compaction]] - [[Context vs Memory]] - [[Hierarchical Retrieval]]