Parallel Tempering - Albert Masoliver's learning site

## Definition **Parallel tempering** (also: replica exchange) is a metaheuristic and MCMC technique that runs multiple copies (replicas) of a system simultaneously at different temperatures and periodically proposes swaps of entire configurations between adjacent temperature levels. It was introduced independently in the optimisation literature (Geyer 1991; Hukushima & Nemoto 1996) as a remedy for the slow mixing of standard [[Simulated Annealing]] at low temperatures. ## Mechanism Let replicas $1, \ldots, M$ run at temperatures $T_1 < T_2 < \cdots < T_M$: - Each replica evolves independently via the [[Metropolis Acceptance Criterion]] at its own temperature: high-$T$ replicas explore broadly, low-$T$ replicas refine locally. - Periodically, a swap between replicas $i$ and $j = i+1$ is proposed. The swap is accepted with probability $P(\text{swap}) = \min\!\left(1,\ \exp\!\left[\left(\frac{1}{T_i} - \frac{1}{T_j}\right)\!\left(f(x_j) - f(x_i)\right)\right]\right)$ - This detailed-balance condition ensures the joint distribution across all replicas remains the correct Boltzmann distribution at each temperature. ## Why It Helps A low-temperature SA chain can become trapped in a deep basin for exponentially long times. In parallel tempering, the same configuration can "ride" upward to a high-temperature replica, be transported across an energy barrier by the high-$T$ random walk, and then descend back to a refined low-$T$ solution. The swaps provide a fast pathway across barriers that would be extremely slow to cross thermally. ## Uses in Two Communities - **Optimisation** — used as a variant of SA when a single cooling schedule is insufficient; the low-$T$ replica tracks the best solution found. - **Bayesian MCMC** — used to sample from multi-modal posteriors where a single Markov chain mixes poorly; all replicas contribute posterior samples (at their respective temperatures), and the low-$T$ replica approximates the true posterior. ## Practical Considerations - Temperature spacing must be chosen so swap acceptance rates are reasonable (typically 20–40 %). - Number of replicas typically 4–32 (one per CPU core is natural for parallelisation). - Communication overhead between replicas grows with swap frequency. ## Related - [[Simulated Annealing]] — the single-chain method that parallel tempering generalises - [[Metropolis Acceptance Criterion]] — governs both within-replica updates and the swap acceptance step - [[Cooling Schedule]] — replaced by a fixed ladder of temperatures; no schedule tuning required - [[Local vs Global Optimum]] — the swapping mechanism is designed specifically to escape local optima