## Definition
**Consumption-based pricing** is a commercial model in which a customer is billed in proportion to the amount of a metered resource it actually uses (compute time, storage, tokens, requests) rather than a fixed per-period subscription. In AI platforms it most often appears as [[Per-Token Pricing]] for model APIs and as per-second compute for hosted inference endpoints.
## How It Differs from Subscriptions
Subscription pricing (SaaS, seat licenses) decouples revenue from cost: the provider absorbs unit-cost variance and prices for the average customer. Consumption pricing re-couples them — the heaviest users pay proportionally more, and the provider's revenue tracks its compute spend almost linearly. This trades predictability (for the buyer) and customer-acquisition friction (for the seller) for sustainable economics under heavy load.
## Why It Took Over AI in 2026
Three converging pressures pushed the industry from flat rates to consumption:
1. [[Agentic Workload Cost Explosion]] — autonomous agents can consume orders of magnitude more tokens than chat sessions, breaking subscription economics.
2. [[AI Compute Crunch]] — supply-constrained GPU/accelerator capacity makes unmetered access untenable.
3. Convergence with cloud — frontier labs increasingly resemble cloud providers in cost structure (capex on compute, opex on power and cooling), and the cloud pricing pattern is consumption-based.
## Variants in Practice
- **Pure usage**: per-token API access (Claude, GPT, Gemini APIs).
- **Hybrid seat + usage**: a fixed platform/seat fee plus metered consumption on top — Anthropic's mid-2026 Enterprise tier is a canonical example ($20/seat plus all token usage at API rates, no included allowance).
- **Tiered consumption**: subscriptions that bundle a token allowance, with overage billed at consumption rates (Claude Pro and Max use this pattern for individual users).
- **Capacity-priced premium**: paying a multiplier to access prioritised compute (Anthropic's Fast Mode runs at 6× standard rates).
## Trade-offs
For customers, consumption pricing rewards efficiency engineering (caching, batching, model selection) but makes budgeting harder and creates a tail-risk in cost spikes. For providers, it aligns incentives with serving heavy users sustainably but raises the friction of casual adoption.
## Related
- [[Per-Token Pricing]]
- [[Agentic Workload Cost Explosion]]
- [[AI Compute Crunch]]
- [[Prompt Caching as Pricing Lever]]
## Sources
- [[Anthropic 2026 Pricing Shift (Kingy AI)]]
- [[Anthropic Enterprise Pricing Shift (Medium)]]
- [[AI is Getting Expensive (The Register)]]