## Definition
**Value iteration** is a dynamic programming algorithm for solving a known [[Markov Decision Process]]. It iteratively applies the **Bellman optimality operator** to the value function until convergence, then extracts the optimal policy.
## Algorithm
```
initialise V(s) = 0 for all s
repeat until ||V_new - V|| < ε:
for each state s:
V_new(s) ← max_a [R(s, a) + γ · Σ_s' P(s' | s, a) · V(s')]
V ← V_new
extract policy: π(s) ← argmax_a [R(s, a) + γ · Σ_s' P(s' | s, a) · V(s')]
```
## Why It Works
The Bellman optimality operator is a **contraction mapping** in $\gamma$-weighted max norm:
$
\|T V - T V'\|_\infty \leq \gamma \|V - V'\|_\infty
$
By Banach's fixed-point theorem, repeated application converges geometrically to the unique fixed point — $V^*$.
## Convergence Rate
After $k$ iterations:
$
\|V_k - V^*\|_\infty \leq \frac{\gamma^k}{1 - \gamma} \|V_0 - V^*\|_\infty
$
So convergence is **geometric in $\gamma^k$**: fewer iterations needed when $\gamma$ is smaller. For $\gamma = 0.99$, expect ~hundreds of iterations to converge to high precision.
## Strengths
- **Conceptually simple.**
- **Convergence guaranteed** for $\gamma < 1$.
- **Optimal solution** (not just locally optimal).
## Limitations
- **Requires known $P$ and $R$.** In real RL settings these are unknown — use [[Q-Learning]] or model-based methods.
- **Curse of dimensionality.** Per iteration, scans all $|S| \cdot |A|$ state-action pairs and sums over all $|S|$ next states. Doesn't scale beyond a few million states.
- **Tabular representation.** For continuous or very large state spaces, function approximation is needed (and convergence guarantees weaken).
## Comparison to Policy Iteration
| Property | Value Iteration | Policy Iteration |
|---|---|---|
| Inner loop | None (one Bellman update) | Policy evaluation (multiple Bellman updates) |
| Outer loop | Many iterations | Few iterations |
| Each iteration | Cheap | Expensive (policy evaluation is itself iterative) |
| Total to converge | Often more iterations, faster per step | Fewer iterations, more per step |
In practice, **modified policy iteration** (a small number of policy-evaluation steps between improvements) often wins.
## Asynchronous Variants
Updates don't have to be in parallel. Asynchronous value iteration updates states one at a time — converges as long as every state is updated infinitely often. Enables incremental and sample-based variants.
## Related
- [[Markov Decision Process]]
- [[Policy Iteration]]
- [[Q-Learning]]
- [[Dynamic Programming]]