## Definition
A **confusion matrix** is a table that lays out a classifier's predictions against the true labels. For binary classification, the four cells (TP, FP, FN, TN) are the foundation for nearly every classification metric.
## Binary Layout
| | Predicted Positive | Predicted Negative |
|-----------------------|--------------------|--------------------|
| **Actual Positive** | TP (True Positive) | FN (False Negative) |
| **Actual Negative** | FP (False Positive)| TN (True Negative) |
## Derived Metrics
$
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
$
$
\text{Precision} = \frac{TP}{TP + FP}
$
$
\text{Recall (Sensitivity)} = \frac{TP}{TP + FN}
$
$
\text{Specificity} = \frac{TN}{TN + FP}
$
$
\text{F1} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
$
See [[Precision and Recall]] and [[F1 Score]] for deeper coverage.
## Why Accuracy Alone Is Misleading
Imagine a fraud detection model where 1% of transactions are fraud. A model that always predicts "not fraud" achieves **99% accuracy** while catching zero fraud. The confusion matrix exposes this: TP = 0 implies the model is useless despite high accuracy.
For imbalanced classes, look at the cells directly — not the aggregate.
## Multi-Class Extension
For $k$ classes, the confusion matrix is $k \times k$. Diagonal entries are correct predictions; off-diagonal are confusions. Often more informative than any single scalar metric:
- *Which classes are confused with which?*
- *Are errors concentrated in one class or spread evenly?*
## Cost-Sensitive Analysis
Different errors carry different costs. The confusion matrix lets you compute:
$
\text{Expected cost} = \sum_{i, j} C(i, j) \cdot \text{count}(i, j)
$
where $C(i, j)$ is the cost of predicting $j$ when truth is $i$. Often dramatically different from accuracy when errors aren't symmetric (medical diagnosis, security alerts, refund decisions).
## Threshold Sensitivity
For probabilistic classifiers, the confusion matrix depends on the decision threshold. Sweep thresholds to produce a **ROC curve** (see [[ROC Curve and AUC]]) or a **precision-recall curve**, which capture the whole trade-off rather than a single operating point.
## Related
- [[Precision and Recall]]
- [[F1 Score]]
- [[ROC Curve and AUC]]