Markov Network - Albert Masoliver's learning site

## Definition A **Markov network** (or Markov Random Field, MRF) is an *undirected* probabilistic graphical model. Joint distribution factorises into non-negative functions over the cliques of an undirected graph. The undirected sibling of the [[Bayesian Network]]. ## Why Undirected Many domains have symmetric, non-causal dependencies: - **Pixels in an image.** Neighbouring pixels correlate; no direction. - **Spins in a magnet** (Ising model). - **Words in a text co-occurrence.** Directed graphs force a parent → child orientation that doesn't naturally fit. ## Factorisation The joint distribution is: $ P(X_1, \dots, X_n) = \frac{1}{Z} \prod_{c \in \mathcal{C}} \phi_c(X_c) $ - $\mathcal{C}$ — set of cliques in the graph. - $\phi_c$ — non-negative **potential function** over the variables in clique $c$. - $Z$ — **partition function**, the normalising constant $Z = \sum_{x} \prod_c \phi_c(x_c)$. Potentials need not be probabilities; only the normalised product is. ## Local Markov Property A variable is conditionally independent of all other variables given its immediate neighbours in the graph. The undirected analogue of [[D-Separation]] for directed models. ## Difference from Bayesian Networks | Property | Bayesian Network | Markov Network | | ----------------------- | ----------------------- | ---------------------------- | | Edge direction | Directed (DAG) | Undirected | | Local factors | Conditional probabilities (sum to 1) | Arbitrary potentials | | Normalisation | Implicit (product is a joint) | Explicit (partition function $Z$) | | Independence reading | D-separation | Graph separation | | Natural for | Causal / generative models | Symmetric correlations | Some distributions are easier to represent in one form; others in the other. ## Conditional Random Fields (CRFs) A **CRF** is a Markov network *conditioned on* observed inputs $X$. Used for structured prediction: $ P(Y \mid X) = \frac{1}{Z(X)} \exp\left(\sum_k w_k \, f_k(X, Y)\right) $ CRFs were the standard model for named-entity recognition, part-of-speech tagging, and image segmentation pre-2015. Often combined with neural networks (BiLSTM-CRF for sequence labelling). ## Inference Same algorithm families as Bayesian networks — [[Variable Elimination]], junction tree, belief propagation, MCMC, variational. Computing the partition function $Z$ is typically the hard part. ## The Ising Model The original MRF. Each variable $X_i \in \{-1, +1\}$; potentials encode "neighbours prefer to align": $ P(X) \propto \exp\left(\sum_{(i,j) \in E} J_{ij} X_i X_j + \sum_i h_i X_i\right) $ A physics-derived model that anticipated graphical models by decades. ## Modern Relevance - Image denoising, segmentation, super-resolution (pre-CNN era). - Markov Logic Networks combine FOL with MRF weights. - Energy-based models in deep learning are MRFs with neural-network potentials. ## Related - [[Bayesian Network]] - [[Hidden Markov Model]] - [[D-Separation]] - [[Variable Elimination]]