Variable Elimination - Albert Masoliver's learning site

## Definition **Variable elimination (VE)** is an exact inference algorithm for [[Bayesian Network]]s. To compute a posterior $P(X_q \mid \text{evidence})$, VE eliminates non-query, non-evidence variables one at a time by summing them out of the joint factorisation. ## The Core Operation Each conditional probability table is a *factor* — a function over its variables. Inference combines factors by: - **Multiplication.** $\phi_1(A, B) \cdot \phi_2(B, C) \to \phi(A, B, C)$. - **Summing out.** $\sum_B \phi(A, B, C) \to \phi'(A, C)$. VE repeatedly does these two until only the query variable remains. ## Algorithm ``` factors ← {CPT[X] for each variable X in network} incorporate evidence by restricting relevant factors for each variable X in elimination order: if X is not query and not evidence: F ← all factors containing X product ← multiply factors in F new_factor ← sum_X(product) factors ← (factors \ F) ∪ {new_factor} result ← product of remaining factors, normalised return result ``` ## Elimination Ordering The order matters dramatically. The same network can have inference cost ranging from polynomial to exponential depending on order: - **Min-fill heuristic** — eliminate the variable that adds the fewest new edges to the remaining graph. - **Min-degree heuristic** — eliminate the variable with the fewest neighbours. - **Min-weight heuristic** — weighted variant accounting for domain sizes. Finding the optimal order is **NP-hard** but heuristics work well in practice. ## Complexity Time and space: $O(n \cdot d^w)$ where $n$ is the number of variables, $d$ the maximum domain size, and $w$ the **treewidth** of the elimination order. For sparse networks (chains, trees, small treewidth), inference is polynomial. ## Strengths - **Exact** — no approximation. - **Anytime in spirit** — works incrementally; partial elimination is reusable. - **Conceptually clean** — easy to derive variants. ## Weaknesses - **Treewidth explosion.** Dense networks have high treewidth → exponential cost. - **No reuse across queries.** Each query restarts the work. (See junction tree for cached version.) ## Junction Tree The **junction tree (clique tree)** algorithm is a generalised, cached version of VE. It compiles the Bayesian network into a tree of cliques and propagates messages on it. Same exponential bound in treewidth, but supports many queries efficiently after one compilation. ## Approximate Inference When VE is too expensive, fall back to: - **Gibbs sampling / Metropolis-Hastings** — MCMC methods. - **Loopy belief propagation** — exact on trees, approximate on cycles. - **Variational inference** — minimise KL divergence to a tractable family. ## Related - [[Bayesian Network]] - [[Bayes Theorem]] - [[D-Separation]]