## Definition
**Optimise turns before tier** is the cost-optimisation sequencing principle that the cheapest agent is the one with the fewest, tightest turns — not the one running on the cheapest model. Switching from Opus to Haiku saves a fixed per-token multiplier; cutting unnecessary turns eliminates entire round-trips of input accumulation, and those savings compound across the whole run.
## Why Turns Dominate
In the [[Agentic Loop]], every turn re-sends the full running context as input. A 10-turn run that accretes a 40k-token context pays roughly 40k input tokens on the last turn alone. The per-turn cost scales with window size, so:
- Each extra turn adds its own output *and* increases the input cost of every subsequent turn.
- Tool-result bloat (large observations re-entering context) multiplies both effects.
A cheaper model reduces the per-token price by a fixed ratio (e.g., Haiku is ~5× cheaper than Opus per million tokens). Eliminating one unnecessary turn with a 30k-token context saves ~30k input tokens regardless of tier. At realistic context sizes, the turn saving dominates.
## The Efficiency Triplet
The three metrics to track on every agent run are **steps** (turns), **cost**, and **latency**. The canonical optimisation order is:
1. Reduce unnecessary turns (tighter prompts, sharper stop conditions, smaller tool outputs).
2. Compress tool outputs (truncate, summarise, or paginate large observations before they re-enter context).
3. Apply [[Prompt Caching as Pricing Lever]] to the stable prefix (system prompt + tool inventory).
4. Only after the above: consider switching tiers.
## Practical Levers
- **Tighter prompts**: a well-specified task completes in fewer iterations than an underspecified one.
- **Capped iteration counts**: an explicit loop ceiling forces the agent to consolidate rather than speculate.
- **Output-length discipline on tool results**: prefer returning a summary or a schema-constrained response over a full dump; the difference recurs on every subsequent turn's input.
- **Tool result clearing** ([[Context Editing]]): drop spent tool outputs from context once their information is incorporated.
## Relationship to Model Tier
This principle does not argue against ever upgrading the model tier — it argues against doing so *first*. If reducing turns and compressing context does not clear the performance bar, then moving to a higher-capability tier (Sonnet → Opus) may be justified. But the sequence matters: tier upgrades are irreversible in the cost sense (you stay on the higher rate), while turn-count improvements compound downward across the whole run.
## Related
- [[Agentic Workload Cost Explosion]]
- [[Model Selection Strategy]]
- [[Prompt Caching as Pricing Lever]]
- [[Context Editing]]
- [[Agentic Loop]]
## Sources
- [[Modern AI Software Engineering - The Orchestrators Playbook]]