AI Compute Crunch - Albert Masoliver's learning site

## Definition The **AI compute crunch** is the structural shortfall in GPU and AI-accelerator supply relative to inference demand that defined the AI economy from late 2024 onward. It is the supply-side cause of the 2026 pricing rebase: with capacity scarce, providers can no longer subsidise unmetered access, and prices rise to reflect the true marginal cost of serving each token. ## What Constitutes the Crunch Three components compound: 1. **Hardware lead times** — frontier accelerators (NVIDIA HGX, H200, Blackwell-class parts) carry 6–18 month order-to-delivery windows; capacity additions trail demand by a year or more. 2. **Power and cooling** — even when chips ship, the datacentre power envelope to host them is a separate constraint; new builds are gated on grid interconnects. 3. **Inference demand mix** — by 2026 inference workloads dwarf training in aggregate compute, and inference is bursty, latency-sensitive, and harder to schedule than training. ## How It Manifests in Pricing - Across-the-board per-token price increases on flagship models (OpenAI roughly doubled GPT-5.5 pricing relative to GPT-4-class predecessors). - The rise of capacity-priced premium tiers — paying a multiplier to skip the queue (e.g. Anthropic Fast Mode at 6× standard rates). - End of unmetered subscriptions for heavy use; subscription plans now serve as a price-anchored entry point with strict caps, with [[Consumption-Based Pricing]] beyond them. - Rate-limit tightening on consumer tiers and the partitioning of capacity to enterprise contracts. ## The Supply-Side Response The hardware ecosystem is responding on three fronts: custom inference silicon (TPU v6, Anthropic's own accelerators via in-house deals, Trainium 2), lower-precision arithmetic (FP4, FP6 in production), and architectural efficiency (MoE, speculative decoding). The Register's framing — "those selling the shovels of the AI boom are now racing to bring new hardware better suited to serving these models" — captures the dynamic. The relief will arrive, but unevenly. Enterprise customers with volume contracts negotiate ahead of the spot prices; consumers absorb the increases first. ## Related - [[Consumption-Based Pricing]] - [[Per-Token Pricing]] - [[Agentic Workload Cost Explosion]] ## Sources - [[AI is Getting Expensive (The Register)]] - [[Anthropic 2026 Pricing Shift (Kingy AI)]]