## Definition
**Parallel tempering** (also: replica exchange) is a metaheuristic and MCMC technique that runs multiple copies (replicas) of a system simultaneously at different temperatures and periodically proposes swaps of entire configurations between adjacent temperature levels. It was introduced independently in the optimisation literature (Geyer 1991; Hukushima & Nemoto 1996) as a remedy for the slow mixing of standard [[Simulated Annealing]] at low temperatures.
## Mechanism
Let replicas $1, \ldots, M$ run at temperatures $T_1 < T_2 < \cdots < T_M$:
- Each replica evolves independently via the [[Metropolis Acceptance Criterion]] at its own temperature: high-$T$ replicas explore broadly, low-$T$ replicas refine locally.
- Periodically, a swap between replicas $i$ and $j = i+1$ is proposed. The swap is accepted with probability
$P(\text{swap}) = \min\!\left(1,\ \exp\!\left[\left(\frac{1}{T_i} - \frac{1}{T_j}\right)\!\left(f(x_j) - f(x_i)\right)\right]\right)$
- This detailed-balance condition ensures the joint distribution across all replicas remains the correct Boltzmann distribution at each temperature.
## Why It Helps
A low-temperature SA chain can become trapped in a deep basin for exponentially long times. In parallel tempering, the same configuration can "ride" upward to a high-temperature replica, be transported across an energy barrier by the high-$T$ random walk, and then descend back to a refined low-$T$ solution. The swaps provide a fast pathway across barriers that would be extremely slow to cross thermally.
## Uses in Two Communities
- **Optimisation** — used as a variant of SA when a single cooling schedule is insufficient; the low-$T$ replica tracks the best solution found.
- **Bayesian MCMC** — used to sample from multi-modal posteriors where a single Markov chain mixes poorly; all replicas contribute posterior samples (at their respective temperatures), and the low-$T$ replica approximates the true posterior.
## Practical Considerations
- Temperature spacing must be chosen so swap acceptance rates are reasonable (typically 20–40 %).
- Number of replicas typically 4–32 (one per CPU core is natural for parallelisation).
- Communication overhead between replicas grows with swap frequency.
## Related
- [[Simulated Annealing]] — the single-chain method that parallel tempering generalises
- [[Metropolis Acceptance Criterion]] — governs both within-replica updates and the swap acceptance step
- [[Cooling Schedule]] — replaced by a fixed ladder of temperatures; no schedule tuning required
- [[Local vs Global Optimum]] — the swapping mechanism is designed specifically to escape local optima