## Definition
**Naive Bayes** is a probabilistic classifier based on applying [[Bayes Theorem]] with the *naive* assumption that all features are conditionally independent given the class:
$
P(c \mid x_1, \dots, x_d) \propto P(c) \prod_{j=1}^d P(x_j \mid c)
$
Predict the class with the highest posterior.
## The Naive Assumption
Independence given the class is almost always wrong in practice — but the classifier often works *surprisingly well* despite the violation. Why? Naive Bayes is biased but stable; the wrong independence assumption is often less damaging than the variance of more complex models trained on the same data.
## Variants
- **Gaussian Naive Bayes** — each $P(x_j \mid c)$ modelled as a 1D Gaussian. For continuous features.
- **Multinomial Naive Bayes** — for count data (e.g., word counts in documents).
- **Bernoulli Naive Bayes** — for binary features (word presence/absence).
- **Complement Naive Bayes** — variant for imbalanced text classification.
## Training
Fast — compute conditional and prior probabilities directly from frequencies:
$
P(x_j = v \mid c) = \frac{\text{count}(x_j = v, c)}{\text{count}(c)}
$
With Laplace (additive) smoothing for unseen feature values.
Single pass over the data. No iterative optimisation.
## Strengths
- **Very fast** — training and inference both linear in data and feature count.
- **Works with tiny datasets** — few parameters, low variance.
- **Handles many features** — no curse of dimensionality the way distance-based models suffer.
- **Probabilistic output** — useful for ranking and threshold tuning.
## Weaknesses
- **Independence assumption** badly violates many real datasets, especially with correlated features.
- **Probability calibration is often poor** — the predicted probabilities are too confident (overshoot 0 or 1). Use Platt scaling or isotonic regression to recalibrate.
- **Sensitive to zero counts** — must use smoothing.
## Where It Wins
- **Text classification** historically: spam detection, sentiment, language ID. Word counts + multinomial Naive Bayes was state-of-the-art for years.
- **Real-time systems** where prediction speed matters.
- **Baseline** for any classification task.
## Beyond Spam
Naive Bayes is the canonical "simple model that surprisingly performs well". When more complex models offer only small gains and interpretability matters, NB is a strong choice.
## Related
- [[Bayes Theorem]]
- [[Logistic Regression]]
- [[Bayesian Network]]