## Definition
**Simulated Annealing (SA)** is a metaheuristic for global optimisation that occasionally accepts worse solutions to escape local optima. Inspired by the physical annealing process: heated metals cool slowly to reach low-energy crystalline states. Introduced by Kirkpatrick, Gelatt & Vecchi (1983).
## The Algorithm
```
x ← initial solution
T ← initial temperature
while T > T_min:
for i = 1 to N(T):
x' ← neighbour of x (random perturbation)
Δ ← f(x') - f(x)
if Δ < 0: x ← x' # better → accept
else: x ← x' with probability exp(-Δ/T) # worse → maybe accept
T ← α · T # cool (typically α = 0.95-0.99)
return best solution found
```
## The Temperature Schedule
- **High temperature:** worse moves are frequently accepted; explores broadly. Behaves like random search.
- **Low temperature:** only improving moves are accepted; converges. Behaves like [[Hill Climbing]].
Cooling slowly enough provably converges to the global optimum (Hajek 1988). The practical cooling rates are heuristic compromises.
## Why It Escapes Local Optima
Hill climbing gets stuck at the first local optimum it finds. SA's randomness allows occasional uphill moves — climbing *out* of basins, especially early when temperature is high. As cooling progresses, the algorithm becomes greedier and refines.
## Key Hyperparameters
- **Initial temperature $T_0$.** Should be high enough to accept most moves initially. Estimate from worst-case $\Delta$ in random sampling.
- **Cooling rate $\alpha$.** Typical: 0.95-0.99. Slower cooling = better solutions, more compute.
- **Iterations per temperature.** Typical: tens to hundreds.
- **Stopping criterion.** Minimum temperature, no improvement for $k$ iterations, total iteration budget.
## Common Use Cases
- **Travelling Salesman Problem.** Classic SA application.
- **Job-shop scheduling.**
- **Graph colouring.**
- **Layout problems** (VLSI placement was the original 1983 application).
- **Hyperparameter optimisation.**
## Strengths
- **Simple to implement.**
- **General-purpose.** Works with any neighbourhood structure.
- **Provably converges to global optimum** under slow enough cooling.
- **No gradients needed.**
## Weaknesses
- **Slow.** Many iterations to converge well.
- **Hyperparameter-sensitive.** Cooling schedule matters; bad choices fail.
- **Beaten by specialised algorithms** on most well-studied combinatorial problems.
## Variants
- **Parallel tempering / replica exchange.** Multiple chains at different temperatures; swap configurations.
- **Adaptive cooling.** Schedule based on acceptance rate.
- **Quantum annealing.** Physical realisation via quantum tunnelling (D-Wave systems). Practical advantages still debated.
## When to Reach For SA
- Combinatorial problem with no known specialised algorithm.
- Neighbourhood structure is clear (small perturbations make sense).
- Computational budget is generous.
- Hard problem where you need *any* good solution.
For well-studied problems (TSP, knapsack, SAT), specialised solvers usually win. SA shines as a general-purpose fallback.
## Related
- [[Hill Climbing]]
- [[Genetic Algorithms]]
- [[Tabu Search]]
- [[Local vs Global Optimum]]