Confusion Matrix - Albert Masoliver's learning site

## Definition A **confusion matrix** is a table that lays out a classifier's predictions against the true labels. For binary classification, the four cells (TP, FP, FN, TN) are the foundation for nearly every classification metric. ## Binary Layout | | Predicted Positive | Predicted Negative | |-----------------------|--------------------|--------------------| | **Actual Positive** | TP (True Positive) | FN (False Negative) | | **Actual Negative** | FP (False Positive)| TN (True Negative) | ## Derived Metrics $ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} $ $ \text{Precision} = \frac{TP}{TP + FP} $ $ \text{Recall (Sensitivity)} = \frac{TP}{TP + FN} $ $ \text{Specificity} = \frac{TN}{TN + FP} $ $ \text{F1} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} $ See [[Precision and Recall]] and [[F1 Score]] for deeper coverage. ## Why Accuracy Alone Is Misleading Imagine a fraud detection model where 1% of transactions are fraud. A model that always predicts "not fraud" achieves **99% accuracy** while catching zero fraud. The confusion matrix exposes this: TP = 0 implies the model is useless despite high accuracy. For imbalanced classes, look at the cells directly — not the aggregate. ## Multi-Class Extension For $k$ classes, the confusion matrix is $k \times k$. Diagonal entries are correct predictions; off-diagonal are confusions. Often more informative than any single scalar metric: - *Which classes are confused with which?* - *Are errors concentrated in one class or spread evenly?* ## Cost-Sensitive Analysis Different errors carry different costs. The confusion matrix lets you compute: $ \text{Expected cost} = \sum_{i, j} C(i, j) \cdot \text{count}(i, j) $ where $C(i, j)$ is the cost of predicting $j$ when truth is $i$. Often dramatically different from accuracy when errors aren't symmetric (medical diagnosis, security alerts, refund decisions). ## Threshold Sensitivity For probabilistic classifiers, the confusion matrix depends on the decision threshold. Sweep thresholds to produce a **ROC curve** (see [[ROC Curve and AUC]]) or a **precision-recall curve**, which capture the whole trade-off rather than a single operating point. ## Related - [[Precision and Recall]] - [[F1 Score]] - [[ROC Curve and AUC]]