## Definition
**D-separation** ("directional separation") is a graphical criterion for reading **conditional independence** off a [[Bayesian Network]]. Two sets of variables $X$ and $Y$ are d-separated given a set $Z$ iff no "active" trail connects them — and d-separation implies conditional independence in the encoded distribution.
## Trail Types
A *trail* is any undirected path between two variables. Each consecutive triple along the trail is one of three patterns. Whether the trail is *active* given $Z$ depends on the pattern.
### Chain: $X \to W \to Y$
- Active if $W \notin Z$.
- Blocked if $W \in Z$.
Information flows from $X$ to $Y$ via $W$ — unless $W$ is observed, which blocks it.
### Fork (Common Cause): $X \leftarrow W \to Y$
- Active if $W \notin Z$.
- Blocked if $W \in Z$.
Like the chain. Observing the common cause shields $X$ and $Y$ from each other.
### Collider (Common Effect): $X \to W \leftarrow Y$
- **Blocked** if $W \notin Z$ AND no descendant of $W$ is in $Z$.
- **Active** if $W \in Z$ OR any descendant of $W$ is in $Z$.
Counterintuitive: observing the common effect *creates* dependence between otherwise independent causes ("explaining away").
## D-Separation Definition
$X$ and $Y$ are d-separated by $Z$ iff **every** trail between any node in $X$ and any node in $Y$ is blocked by some triple given $Z$.
## Theorem
If $X$ and $Y$ are d-separated by $Z$ in graph $G$, then $X \perp Y \mid Z$ in every distribution that factorises according to $G$.
The converse direction (faithfulness) holds for *almost all* distributions but not necessarily every one — useful in practice, with caveats in pathological cases.
## Why It Matters
D-separation is the workhorse for:
- **Reading independencies off the graph** without computation.
- **Choosing the right variables to condition on** for causal inference.
- **Diagnosing what evidence will change posteriors** in inference algorithms.
- **Causal identification** — Pearl's do-calculus relies on d-separation.
## "Explaining Away" — Classic Example
Suppose `Burglary` and `Earthquake` are independent causes of `Alarm`.
- Without observing `Alarm`: $P(\text{Earthquake} \mid \text{Burglary}) = P(\text{Earthquake})$. Independent.
- After observing `Alarm`: $P(\text{Earthquake} \mid \text{Burglary}, \text{Alarm}) < P(\text{Earthquake} \mid \text{Alarm})$. Burglary "explains away" some of the alarm evidence, reducing the probability of an earthquake.
This is exactly the collider pattern in action.
## Algorithmic Check
Several polynomial-time algorithms decide d-separation. The classic *Bayes-ball* algorithm runs a graph traversal with rules for which directions are passable depending on observed/unobserved status.
## Related
- [[Bayesian Network]]
- [[Bayes Theorem]]
- [[Variable Elimination]]