## Definition **Precision** and **recall** measure two complementary aspects of a classifier's positive predictions. They are the dominant metric pair when classes are imbalanced or when error costs are asymmetric — i.e., most production ML scenarios. ## Definitions $ \text{Precision} = \frac{TP}{TP + FP} $ $ \text{Recall} = \frac{TP}{TP + FN} $ - **Precision** — of the predictions I labelled positive, what fraction were actually positive? - **Recall** — of the actually positive items, what fraction did I catch? ## The Trade-off Adjusting the decision threshold of a probabilistic classifier moves precision and recall in opposite directions: - **Lower threshold** → more positive predictions → higher recall, lower precision. - **Higher threshold** → fewer positive predictions → higher precision, lower recall. Where you sit depends on the cost structure. ## When To Optimise For Which - **High recall matters when missing positives is costly.** Cancer screening, fraud detection, security threats. Better to false-alarm than miss. - **High precision matters when false positives are costly.** Spam filtering (don't lose legitimate mail), legal document review (don't waste lawyer time on irrelevant docs), automated approvals. Many real systems set a *precision floor* and maximise recall subject to that — or vice versa. ## Precision-Recall Curve Plot precision vs recall across all decision thresholds. **AUC-PR** (area under the precision-recall curve) summarises the trade-off in one number. AUC-PR is often more informative than ROC-AUC for imbalanced classes, because it ignores true negatives entirely. ## F1 Score The harmonic mean of precision and recall — see [[F1 Score]]. ## Multi-Class Precision and Recall Compute per class and aggregate: - **Macro-average.** Mean across classes, treating each equally. Sensitive to small classes. - **Micro-average.** Sum TP, FP, FN across classes, then compute. Dominated by large classes. - **Weighted average.** Weighted by class support. The right choice depends on whether minority-class performance should "count" equally. ## Related - [[Confusion Matrix]] - [[F1 Score]] - [[ROC Curve and AUC]]