Consumption-Based Pricing - Albert Masoliver's learning site

## Definition **Consumption-based pricing** is a commercial model in which a customer is billed in proportion to the amount of a metered resource it actually uses (compute time, storage, tokens, requests) rather than a fixed per-period subscription. In AI platforms it most often appears as [[Per-Token Pricing]] for model APIs and as per-second compute for hosted inference endpoints. ## How It Differs from Subscriptions Subscription pricing (SaaS, seat licenses) decouples revenue from cost: the provider absorbs unit-cost variance and prices for the average customer. Consumption pricing re-couples them — the heaviest users pay proportionally more, and the provider's revenue tracks its compute spend almost linearly. This trades predictability (for the buyer) and customer-acquisition friction (for the seller) for sustainable economics under heavy load. ## Why It Took Over AI in 2026 Three converging pressures pushed the industry from flat rates to consumption: 1. [[Agentic Workload Cost Explosion]] — autonomous agents can consume orders of magnitude more tokens than chat sessions, breaking subscription economics. 2. [[AI Compute Crunch]] — supply-constrained GPU/accelerator capacity makes unmetered access untenable. 3. Convergence with cloud — frontier labs increasingly resemble cloud providers in cost structure (capex on compute, opex on power and cooling), and the cloud pricing pattern is consumption-based. ## Variants in Practice - **Pure usage**: per-token API access (Claude, GPT, Gemini APIs). - **Hybrid seat + usage**: a fixed platform/seat fee plus metered consumption on top — Anthropic's mid-2026 Enterprise tier is a canonical example ($20/seat plus all token usage at API rates, no included allowance). - **Tiered consumption**: subscriptions that bundle a token allowance, with overage billed at consumption rates (Claude Pro and Max use this pattern for individual users). - **Capacity-priced premium**: paying a multiplier to access prioritised compute (Anthropic's Fast Mode runs at 6× standard rates). ## Trade-offs For customers, consumption pricing rewards efficiency engineering (caching, batching, model selection) but makes budgeting harder and creates a tail-risk in cost spikes. For providers, it aligns incentives with serving heavy users sustainably but raises the friction of casual adoption. ## Related - [[Per-Token Pricing]] - [[Agentic Workload Cost Explosion]] - [[AI Compute Crunch]] - [[Prompt Caching as Pricing Lever]] ## Sources - [[Anthropic 2026 Pricing Shift (Kingy AI)]] - [[Anthropic Enterprise Pricing Shift (Medium)]] - [[AI is Getting Expensive (The Register)]]