Simulated Annealing - Albert Masoliver's learning site

## Definition **Simulated Annealing (SA)** is a metaheuristic for global optimisation that occasionally accepts worse solutions to escape local optima. Inspired by the physical annealing process: heated metals cool slowly to reach low-energy crystalline states. Introduced by Kirkpatrick, Gelatt & Vecchi (1983). ## The Algorithm ``` x ← initial solution T ← initial temperature while T > T_min: for i = 1 to N(T): x' ← neighbour of x (random perturbation) Δ ← f(x') - f(x) if Δ < 0: x ← x' # better → accept else: x ← x' with probability exp(-Δ/T) # worse → maybe accept T ← α · T # cool (typically α = 0.95-0.99) return best solution found ``` ## The Temperature Schedule - **High temperature:** worse moves are frequently accepted; explores broadly. Behaves like random search. - **Low temperature:** only improving moves are accepted; converges. Behaves like [[Hill Climbing]]. Cooling slowly enough provably converges to the global optimum (Hajek 1988). The practical cooling rates are heuristic compromises. ## Why It Escapes Local Optima Hill climbing gets stuck at the first local optimum it finds. SA's randomness allows occasional uphill moves — climbing *out* of basins, especially early when temperature is high. As cooling progresses, the algorithm becomes greedier and refines. ## Key Hyperparameters - **Initial temperature $T_0$.** Should be high enough to accept most moves initially. Estimate from worst-case $\Delta$ in random sampling. - **Cooling rate $\alpha$.** Typical: 0.95-0.99. Slower cooling = better solutions, more compute. - **Iterations per temperature.** Typical: tens to hundreds. - **Stopping criterion.** Minimum temperature, no improvement for $k$ iterations, total iteration budget. ## Common Use Cases - **Travelling Salesman Problem.** Classic SA application. - **Job-shop scheduling.** - **Graph colouring.** - **Layout problems** (VLSI placement was the original 1983 application). - **Hyperparameter optimisation.** ## Strengths - **Simple to implement.** - **General-purpose.** Works with any neighbourhood structure. - **Provably converges to global optimum** under slow enough cooling. - **No gradients needed.** ## Weaknesses - **Slow.** Many iterations to converge well. - **Hyperparameter-sensitive.** Cooling schedule matters; bad choices fail. - **Beaten by specialised algorithms** on most well-studied combinatorial problems. ## Variants - **Parallel tempering / replica exchange.** Multiple chains at different temperatures; swap configurations. - **Adaptive cooling.** Schedule based on acceptance rate. - **Quantum annealing.** Physical realisation via quantum tunnelling (D-Wave systems). Practical advantages still debated. ## When to Reach For SA - Combinatorial problem with no known specialised algorithm. - Neighbourhood structure is clear (small perturbations make sense). - Computational budget is generous. - Hard problem where you need *any* good solution. For well-studied problems (TSP, knapsack, SAT), specialised solvers usually win. SA shines as a general-purpose fallback. ## Related - [[Hill Climbing]] - [[Genetic Algorithms]] - [[Tabu Search]] - [[Local vs Global Optimum]]