F1 Score - Albert Masoliver's learning site

## Definition The **F1 score** is the harmonic mean of precision and recall: $ F_1 = 2 \cdot \frac{P \cdot R}{P + R} $ It collapses [[Precision and Recall]] into a single number, useful when you need to compare classifiers without picking one or the other. ## Why Harmonic Mean The harmonic mean punishes imbalance. A classifier with precision 0.99 and recall 0.01 has: - Arithmetic mean: 0.50 (looks fine). - Harmonic mean (F1): 0.02 (correctly diagnoses the disaster). F1 is only high when *both* precision and recall are reasonably high. That's usually what you want. ## Generalised — $F_\beta$ $ F_\beta = (1 + \beta^2) \cdot \frac{P \cdot R}{\beta^2 \cdot P + R} $ - $\beta = 1$: balances precision and recall (F1). - $\beta = 2$: weights recall twice as much as precision. - $\beta = 0.5$: weights precision twice as much as recall. Choose $\beta$ based on which error type is costlier in your domain. ## When F1 Is Right - **Single-number comparison** across classifiers. - **Imbalanced classes** — accuracy is misleading; F1 reflects performance on the rare class. - **Hyperparameter tuning** when you can't articulate the precision-recall trade-off precisely. ## When F1 Is Wrong - **Ranking tasks.** Use AUC instead. - **Asymmetric costs that aren't well-captured by a single $\beta$.** Compute expected cost directly from the [[Confusion Matrix]]. - **Multi-class with unequal class importance.** Macro-F1 treats all classes equally; weighted-F1 weights by support — choose based on intent. ## Multi-Class Variants - **Macro-F1.** Compute F1 per class, then average. Treats classes equally. - **Micro-F1.** Pool TP, FP, FN across classes, then compute. Equivalent to accuracy for single-label multi-class. - **Weighted-F1.** Per-class F1 weighted by support. ## Practical Note In Kaggle and ML competition contexts, F1 (especially macro-F1) is the default metric for imbalanced classification. In production, the choice between F1 and a cost-sensitive metric should follow business reality, not convention. ## Related - [[Precision and Recall]] - [[Confusion Matrix]] - [[ROC Curve and AUC]]