Optimise Turns Before Tier - Albert Masoliver's learning site

## Definition **Optimise turns before tier** is the cost-optimisation sequencing principle that the cheapest agent is the one with the fewest, tightest turns — not the one running on the cheapest model. Switching from Opus to Haiku saves a fixed per-token multiplier; cutting unnecessary turns eliminates entire round-trips of input accumulation, and those savings compound across the whole run. ## Why Turns Dominate In the [[Agentic Loop]], every turn re-sends the full running context as input. A 10-turn run that accretes a 40k-token context pays roughly 40k input tokens on the last turn alone. The per-turn cost scales with window size, so: - Each extra turn adds its own output *and* increases the input cost of every subsequent turn. - Tool-result bloat (large observations re-entering context) multiplies both effects. A cheaper model reduces the per-token price by a fixed ratio (e.g., Haiku is ~5× cheaper than Opus per million tokens). Eliminating one unnecessary turn with a 30k-token context saves ~30k input tokens regardless of tier. At realistic context sizes, the turn saving dominates. ## The Efficiency Triplet The three metrics to track on every agent run are **steps** (turns), **cost**, and **latency**. The canonical optimisation order is: 1. Reduce unnecessary turns (tighter prompts, sharper stop conditions, smaller tool outputs). 2. Compress tool outputs (truncate, summarise, or paginate large observations before they re-enter context). 3. Apply [[Prompt Caching as Pricing Lever]] to the stable prefix (system prompt + tool inventory). 4. Only after the above: consider switching tiers. ## Practical Levers - **Tighter prompts**: a well-specified task completes in fewer iterations than an underspecified one. - **Capped iteration counts**: an explicit loop ceiling forces the agent to consolidate rather than speculate. - **Output-length discipline on tool results**: prefer returning a summary or a schema-constrained response over a full dump; the difference recurs on every subsequent turn's input. - **Tool result clearing** ([[Context Editing]]): drop spent tool outputs from context once their information is incorporated. ## Relationship to Model Tier This principle does not argue against ever upgrading the model tier — it argues against doing so *first*. If reducing turns and compressing context does not clear the performance bar, then moving to a higher-capability tier (Sonnet → Opus) may be justified. But the sequence matters: tier upgrades are irreversible in the cost sense (you stay on the higher rate), while turn-count improvements compound downward across the whole run. ## Related - [[Agentic Workload Cost Explosion]] - [[Model Selection Strategy]] - [[Prompt Caching as Pricing Lever]] - [[Context Editing]] - [[Agentic Loop]] ## Sources - [[Modern AI Software Engineering - The Orchestrators Playbook]]