## Definition
**Bayes' theorem** relates the conditional and marginal probabilities of two random events. It is the mathematical foundation of probabilistic reasoning in AI and ML.
$
P(H \mid E) = \frac{P(E \mid H) \, P(H)}{P(E)}
$
- $P(H \mid E)$ — **posterior** probability of hypothesis $H$ given evidence $E$.
- $P(E \mid H)$ — **likelihood** of evidence given the hypothesis.
- $P(H)$ — **prior** probability of the hypothesis.
- $P(E)$ — marginal probability of the evidence (the normalising constant).
## Derivation
From the definition of conditional probability:
$
P(H \mid E) \, P(E) = P(H \cap E) = P(E \mid H) \, P(H)
$
Divide by $P(E)$.
## Why It Matters
Bayes' rule **inverts conditional probabilities.** We often have:
- $P(\text{symptom} \mid \text{disease})$ — easy to obtain from medical studies.
We want:
- $P(\text{disease} \mid \text{symptom})$ — needed for diagnosis.
The two are connected by the prior $P(\text{disease})$ and the marginal $P(\text{symptom})$ — both observable.
## Worked Example: Medical Test
A test for a rare disease (1 in 10,000) has 99% sensitivity and 99% specificity. A patient tests positive — what's the probability they have it?
$
P(\text{disease} \mid +) = \frac{P(+ \mid \text{disease}) \, P(\text{disease})}{P(+)} = \frac{0.99 \times 0.0001}{0.99 \times 0.0001 + 0.01 \times 0.9999} \approx 0.0098
$
About 1%. Counterintuitive — the base rate dominates over the test accuracy when the disease is rare.
## Applied Widely
- **[[Naive Bayes]]** classifier — assumes feature independence.
- **[[Bayesian Network]]** — graphical models of conditional dependencies.
- **Bayesian inference** in statistics — update beliefs as data arrives.
- **Probabilistic programming** — Stan, PyMC, NumPyro, Pyro.
- **Spam filtering, medical diagnosis, fault diagnosis, anomaly detection.**
## Bayesian vs Frequentist
- **Frequentist:** probability is the long-run frequency of an event.
- **Bayesian:** probability is a degree of belief, updated via Bayes' rule as evidence arrives.
The split is philosophical but consequential: it shapes confidence intervals, hypothesis tests, and the meaning of parameters.
## Related
- [[Bayesian Network]]
- [[Naive Bayes]]
- [[Hidden Markov Model]]
- [[D-Separation]]