## Definition
**Prompt caching** is an Anthropic API feature that lets you mark a prefix of a prompt as cacheable. Subsequent requests sharing that prefix hit the cache and pay ~10% of the prefix cost.
## Why It Dominates Agentic Cost
The prefix in agentic work is almost always **stable**:
- `AGENTS.md` rarely changes between turns.
- `CLAUDE.md` rarely changes.
- Skill definitions don't change mid-session.
- The current spec doesn't change.
The per-turn delta is small. A 60%+ cache hit rate on a CI reviewer cuts per-PR cost dramatically.
## Two Modes
1. **Automatic caching.** Single `cache_control` at the top level; the system applies the breakpoint to the last cacheable block and moves it forward as conversations grow.
2. **Explicit breakpoints.** Place `cache_control` directly on individual content blocks for fine-grained control.
## Configuration
```python
response = client.messages.create(
model="claude-sonnet-4-6",
system=[{
"type": "text",
"text": system_prompt,
"cache_control": {"type": "ephemeral"}
}],
messages=[{"role": "user", "content": "..."}],
)
# response.usage.cache_read_input_tokens tells you the hit size.
```
## Technical Details
- **Minimum cacheable length.** 1,024 tokens for Sonnet; 4,096 for Opus / Haiku 4.5. Below the floor, caching costs more than it saves.
- **TTL.** 5 minutes by default, refreshed on each hit. `"ttl": "1h"` extends to one hour.
- **Breakpoints.** Up to 4 explicit per request; automatic caching uses one slot.
## Pricing (relative to base input)
| Action | Multiplier |
| -------------------- | ---------- |
| 5-min cache write | 1.25× |
| 1-hour cache write | 2.0× |
| Cache read | 0.1× |
## What to Cache, What Not To
- **Cache:** large, stable content — `AGENTS.md`, full specs, long skill definitions.
- **Don't cache:** small or volatile content — the current diff, user input, today's date.
## Related
- [[Token]]
- [[Spec-as-Prompt Pattern]]
- [[Headless Agent in CI]]
- [[Model Selection Strategy]]