## Definition
**Monte Carlo Tree Search (MCTS)** is a heuristic search algorithm for decision processes — particularly game playing. It builds a search tree asymmetrically, biased toward promising moves, by combining tree search with **random rollouts**. Underlies AlphaGo, AlphaZero, and many modern game-playing systems.
## The Four-Phase Loop
Each iteration:
1. **Selection.** From the root, traverse the tree using a tree policy (typically UCT — see below) until reaching a leaf node.
2. **Expansion.** Add one or more new child nodes to the tree from the leaf.
3. **Simulation (rollout).** Play a random (or weakly heuristic) game from the new node to a terminal state. Record the outcome.
4. **Backpropagation.** Update statistics — visits, wins — along the path from the new node back to the root.
After many iterations, choose the move with the most visits (or best win rate) at the root.
## UCT — Upper Confidence Bounds for Trees
The tree policy uses UCT to balance exploration and exploitation:
$
\text{UCT}(s) = \frac{w_s}{n_s} + c \sqrt{\frac{\ln N}{n_s}}
$
- $w_s$ — wins through this node.
- $n_s$ — visits to this node.
- $N$ — visits to the parent.
- $c$ — exploration constant (typically $\sqrt{2}$).
The first term favours nodes with high empirical win rate; the second term favours under-explored nodes.
## Properties
- **Anytime.** Stop after any number of iterations; the best move so far is always available.
- **No evaluation function needed.** Random rollouts substitute. (Modern variants — AlphaZero — replace rollouts with a learned value network.)
- **Asymmetric tree.** Promising branches grow deeper; weak branches stay shallow.
- **Converges to minimax** in the limit of infinite simulations.
## Where MCTS Won
- **Go.** Game tree too large for alpha-beta ($b \approx 250$). MCTS (Coulom 2006, MoGo, Crazy Stone) became state-of-the-art.
- **General Game Playing.** No game-specific evaluation function known.
- **Real-time strategy games.** Anytime property is critical.
- **AlphaGo** (2016) and **AlphaZero** (2017) combine MCTS with deep neural networks: policy network biases selection, value network replaces rollouts. The combination defeated world champions and exceeded human-tuned engines in chess and shogi.
## Limits
- **Slow on tactical positions.** A single critical move buried in random rollouts may be missed. Hybrid systems combine MCTS with alpha-beta tactical search.
- **High variance** of rollouts adds noise.
- **Doesn't naturally handle imperfect-information games.** Variants exist (Information Set MCTS) but are subtler.
## Beyond Games
MCTS has been applied to planning under uncertainty, scheduling, automated theorem proving, and as a search component in LLM-driven reasoning systems.
## Related
- [[Minimax Algorithm]]
- [[Alpha-Beta Pruning]]
- [[Evaluation Function]]
- [[Multi-Armed Bandit]]