## Definition
**Variable elimination (VE)** is an exact inference algorithm for [[Bayesian Network]]s. To compute a posterior $P(X_q \mid \text{evidence})$, VE eliminates non-query, non-evidence variables one at a time by summing them out of the joint factorisation.
## The Core Operation
Each conditional probability table is a *factor* — a function over its variables. Inference combines factors by:
- **Multiplication.** $\phi_1(A, B) \cdot \phi_2(B, C) \to \phi(A, B, C)$.
- **Summing out.** $\sum_B \phi(A, B, C) \to \phi'(A, C)$.
VE repeatedly does these two until only the query variable remains.
## Algorithm
```
factors ← {CPT[X] for each variable X in network}
incorporate evidence by restricting relevant factors
for each variable X in elimination order:
if X is not query and not evidence:
F ← all factors containing X
product ← multiply factors in F
new_factor ← sum_X(product)
factors ← (factors \ F) ∪ {new_factor}
result ← product of remaining factors, normalised
return result
```
## Elimination Ordering
The order matters dramatically. The same network can have inference cost ranging from polynomial to exponential depending on order:
- **Min-fill heuristic** — eliminate the variable that adds the fewest new edges to the remaining graph.
- **Min-degree heuristic** — eliminate the variable with the fewest neighbours.
- **Min-weight heuristic** — weighted variant accounting for domain sizes.
Finding the optimal order is **NP-hard** but heuristics work well in practice.
## Complexity
Time and space: $O(n \cdot d^w)$ where $n$ is the number of variables, $d$ the maximum domain size, and $w$ the **treewidth** of the elimination order. For sparse networks (chains, trees, small treewidth), inference is polynomial.
## Strengths
- **Exact** — no approximation.
- **Anytime in spirit** — works incrementally; partial elimination is reusable.
- **Conceptually clean** — easy to derive variants.
## Weaknesses
- **Treewidth explosion.** Dense networks have high treewidth → exponential cost.
- **No reuse across queries.** Each query restarts the work. (See junction tree for cached version.)
## Junction Tree
The **junction tree (clique tree)** algorithm is a generalised, cached version of VE. It compiles the Bayesian network into a tree of cliques and propagates messages on it. Same exponential bound in treewidth, but supports many queries efficiently after one compilation.
## Approximate Inference
When VE is too expensive, fall back to:
- **Gibbs sampling / Metropolis-Hastings** — MCMC methods.
- **Loopy belief propagation** — exact on trees, approximate on cycles.
- **Variational inference** — minimise KL divergence to a tractable family.
## Related
- [[Bayesian Network]]
- [[Bayes Theorem]]
- [[D-Separation]]