## Definition **Naive Bayes** is a probabilistic classifier based on applying [[Bayes Theorem]] with the *naive* assumption that all features are conditionally independent given the class: $ P(c \mid x_1, \dots, x_d) \propto P(c) \prod_{j=1}^d P(x_j \mid c) $ Predict the class with the highest posterior. ## The Naive Assumption Independence given the class is almost always wrong in practice — but the classifier often works *surprisingly well* despite the violation. Why? Naive Bayes is biased but stable; the wrong independence assumption is often less damaging than the variance of more complex models trained on the same data. ## Variants - **Gaussian Naive Bayes** — each $P(x_j \mid c)$ modelled as a 1D Gaussian. For continuous features. - **Multinomial Naive Bayes** — for count data (e.g., word counts in documents). - **Bernoulli Naive Bayes** — for binary features (word presence/absence). - **Complement Naive Bayes** — variant for imbalanced text classification. ## Training Fast — compute conditional and prior probabilities directly from frequencies: $ P(x_j = v \mid c) = \frac{\text{count}(x_j = v, c)}{\text{count}(c)} $ With Laplace (additive) smoothing for unseen feature values. Single pass over the data. No iterative optimisation. ## Strengths - **Very fast** — training and inference both linear in data and feature count. - **Works with tiny datasets** — few parameters, low variance. - **Handles many features** — no curse of dimensionality the way distance-based models suffer. - **Probabilistic output** — useful for ranking and threshold tuning. ## Weaknesses - **Independence assumption** badly violates many real datasets, especially with correlated features. - **Probability calibration is often poor** — the predicted probabilities are too confident (overshoot 0 or 1). Use Platt scaling or isotonic regression to recalibrate. - **Sensitive to zero counts** — must use smoothing. ## Where It Wins - **Text classification** historically: spam detection, sentiment, language ID. Word counts + multinomial Naive Bayes was state-of-the-art for years. - **Real-time systems** where prediction speed matters. - **Baseline** for any classification task. ## Beyond Spam Naive Bayes is the canonical "simple model that surprisingly performs well". When more complex models offer only small gains and interpretability matters, NB is a strong choice. ## Related - [[Bayes Theorem]] - [[Logistic Regression]] - [[Bayesian Network]]