## Definition **Bayes' theorem** relates the conditional and marginal probabilities of two random events. It is the mathematical foundation of probabilistic reasoning in AI and ML. $ P(H \mid E) = \frac{P(E \mid H) \, P(H)}{P(E)} $ - $P(H \mid E)$ — **posterior** probability of hypothesis $H$ given evidence $E$. - $P(E \mid H)$ — **likelihood** of evidence given the hypothesis. - $P(H)$ — **prior** probability of the hypothesis. - $P(E)$ — marginal probability of the evidence (the normalising constant). ## Derivation From the definition of conditional probability: $ P(H \mid E) \, P(E) = P(H \cap E) = P(E \mid H) \, P(H) $ Divide by $P(E)$. ## Why It Matters Bayes' rule **inverts conditional probabilities.** We often have: - $P(\text{symptom} \mid \text{disease})$ — easy to obtain from medical studies. We want: - $P(\text{disease} \mid \text{symptom})$ — needed for diagnosis. The two are connected by the prior $P(\text{disease})$ and the marginal $P(\text{symptom})$ — both observable. ## Worked Example: Medical Test A test for a rare disease (1 in 10,000) has 99% sensitivity and 99% specificity. A patient tests positive — what's the probability they have it? $ P(\text{disease} \mid +) = \frac{P(+ \mid \text{disease}) \, P(\text{disease})}{P(+)} = \frac{0.99 \times 0.0001}{0.99 \times 0.0001 + 0.01 \times 0.9999} \approx 0.0098 $ About 1%. Counterintuitive — the base rate dominates over the test accuracy when the disease is rare. ## Applied Widely - **[[Naive Bayes]]** classifier — assumes feature independence. - **[[Bayesian Network]]** — graphical models of conditional dependencies. - **Bayesian inference** in statistics — update beliefs as data arrives. - **Probabilistic programming** — Stan, PyMC, NumPyro, Pyro. - **Spam filtering, medical diagnosis, fault diagnosis, anomaly detection.** ## Bayesian vs Frequentist - **Frequentist:** probability is the long-run frequency of an event. - **Bayesian:** probability is a degree of belief, updated via Bayes' rule as evidence arrives. The split is philosophical but consequential: it shapes confidence intervals, hypothesis tests, and the meaning of parameters. ## Related - [[Bayesian Network]] - [[Naive Bayes]] - [[Hidden Markov Model]] - [[D-Separation]]