D-Separation - Albert Masoliver's learning site

## Definition **D-separation** ("directional separation") is a graphical criterion for reading **conditional independence** off a [[Bayesian Network]]. Two sets of variables $X$ and $Y$ are d-separated given a set $Z$ iff no "active" trail connects them — and d-separation implies conditional independence in the encoded distribution. ## Trail Types A *trail* is any undirected path between two variables. Each consecutive triple along the trail is one of three patterns. Whether the trail is *active* given $Z$ depends on the pattern. ### Chain: $X \to W \to Y$ - Active if $W \notin Z$. - Blocked if $W \in Z$. Information flows from $X$ to $Y$ via $W$ — unless $W$ is observed, which blocks it. ### Fork (Common Cause): $X \leftarrow W \to Y$ - Active if $W \notin Z$. - Blocked if $W \in Z$. Like the chain. Observing the common cause shields $X$ and $Y$ from each other. ### Collider (Common Effect): $X \to W \leftarrow Y$ - **Blocked** if $W \notin Z$ AND no descendant of $W$ is in $Z$. - **Active** if $W \in Z$ OR any descendant of $W$ is in $Z$. Counterintuitive: observing the common effect *creates* dependence between otherwise independent causes ("explaining away"). ## D-Separation Definition $X$ and $Y$ are d-separated by $Z$ iff **every** trail between any node in $X$ and any node in $Y$ is blocked by some triple given $Z$. ## Theorem If $X$ and $Y$ are d-separated by $Z$ in graph $G$, then $X \perp Y \mid Z$ in every distribution that factorises according to $G$. The converse direction (faithfulness) holds for *almost all* distributions but not necessarily every one — useful in practice, with caveats in pathological cases. ## Why It Matters D-separation is the workhorse for: - **Reading independencies off the graph** without computation. - **Choosing the right variables to condition on** for causal inference. - **Diagnosing what evidence will change posteriors** in inference algorithms. - **Causal identification** — Pearl's do-calculus relies on d-separation. ## "Explaining Away" — Classic Example Suppose `Burglary` and `Earthquake` are independent causes of `Alarm`. - Without observing `Alarm`: $P(\text{Earthquake} \mid \text{Burglary}) = P(\text{Earthquake})$. Independent. - After observing `Alarm`: $P(\text{Earthquake} \mid \text{Burglary}, \text{Alarm}) < P(\text{Earthquake} \mid \text{Alarm})$. Burglary "explains away" some of the alarm evidence, reducing the probability of an earthquake. This is exactly the collider pattern in action. ## Algorithmic Check Several polynomial-time algorithms decide d-separation. The classic *Bayes-ball* algorithm runs a graph traversal with rules for which directions are passable depending on observed/unobserved status. ## Related - [[Bayesian Network]] - [[Bayes Theorem]] - [[Variable Elimination]]