Model Gateway - Albert Masoliver's learning site

## Definition A **model gateway** is an intermediate service layer that provides a single, unified interface through which an organisation's applications access multiple AI models — whether self-hosted or behind commercial APIs. It abstracts away provider-specific SDKs and adds cross-cutting concerns such as access control, cost management, fallback routing, logging, and optionally caching and guardrails. ## Core Functions **Unified interface.** Applications call one endpoint regardless of whether the underlying model is GPT-4, Gemini, a self-hosted Llama, or a fine-tuned internal model. API provider changes or model swaps require updating only the gateway, not every downstream application. **Access control and cost governance.** Instead of distributing raw API keys (which can be leaked), teams receive scoped gateway credentials. The gateway enforces which user or application can call which model, and can cap usage to prevent runaway spend. **Fallback and resilience.** When a primary API is unavailable or rate-limited, the gateway can automatically retry, switch to a backup provider, or return a graceful degradation response. This is important because API outages are common among model providers. **Observability.** All requests and responses flow through the gateway, making it a natural point for logging, latency tracking, and cost attribution — foundational inputs for the monitoring practices described in [[AI Application Architecture]]. **Optional extras.** Load balancing, semantic caching, and guardrail enforcement can also be layered into the gateway, since the data is already passing through it. ## Relationship to Routers A **router** (intent classifier or next-action predictor) chooses *which model or pipeline* should handle a given query; the gateway is the infrastructure through which that chosen model is accessed. Routers are typically implemented on top of foundation models and live conceptually inside the Model API box; the gateway *is* that box in production. ## Off-the-Shelf Options Examples as of 2024 include Portkey AI Gateway, MLflow AI Gateway, LLM Gateway (Wealthsimple), TrueFoundry, Kong, and Cloudflare's AI Gateway. Many orchestration frameworks (LangChain, LlamaIndex) also provide gateway-like abstractions, blurring the boundary. ## When to Add One A gateway adds operational complexity. It is worth introducing when an organisation: - Uses two or more model providers simultaneously. - Needs auditable per-team cost attribution. - Requires fallback guarantees for production SLAs. - Wants to enforce guardrails or caching centrally without changing application code. ## Related - [[AI Application Architecture]] - [[Prompt Caching]] - [[Model Selection Strategy]] - [[Three-Layer AI Stack]] ## Sources - [[AI Engineering - Chip Huyen]]