## Definition **MSE**, **MAE**, and **RMSE** are the three workhorse loss/error metrics for regression — measuring how far predictions are from true values. ## Definitions For predictions $\hat y_i$ and true values $y_i$: $ \text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat y_i)^2 $ $ \text{RMSE} = \sqrt{\text{MSE}} = \sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \hat y_i)^2} $ $ \text{MAE} = \frac{1}{n} \sum_{i=1}^n |y_i - \hat y_i| $ ## Properties | Metric | Units | Robust to outliers? | Differentiable everywhere? | |---|---|---|---| | MSE | $y^2$ | No (squares amplify outliers) | Yes | | RMSE | $y$ | No | Yes (except at 0 error) | | MAE | $y$ | Yes | No (kink at 0) | ## When to Use Which - **MSE / RMSE** — when large errors should be punished more than small ones. Mathematically convenient (smooth gradient, decomposes into bias² + variance). Default for most regression. - **MAE** — when you want errors weighted linearly. Robust to outliers. Use when the data has heavy-tailed noise. - **RMSE over MSE** — RMSE has the same units as the target, easier to interpret. Use MSE during training (smooth loss); report RMSE. ## The Median vs Mean Connection Surprisingly elegant: - **MSE** is minimised by predicting the *mean* of the target distribution at each input. - **MAE** is minimised by predicting the *median*. So your choice of loss implicitly chooses what kind of central tendency you're estimating. ## Other Useful Variants ### MAPE (Mean Absolute Percentage Error) $ \text{MAPE} = \frac{100}{n} \sum_{i=1}^n \left| \frac{y_i - \hat y_i}{y_i} \right| $ Unitless, interpretable as a percentage. Breaks down when $y_i \approx 0$. ### sMAPE (Symmetric MAPE) Variant that handles small / zero true values more gracefully. ### Huber Loss Quadratic for small errors, linear for large ones — combines MSE smoothness with MAE robustness. Used heavily in robust regression and reinforcement learning (DQN's loss is Huber). ## Reporting In a research paper or model card, report **both** an error metric (RMSE or MAE) and a goodness-of-fit metric ([[R-Squared Coefficient]]). The pair tells the full story. ## Related - [[R-Squared Coefficient]] - [[Loss Functions]] - [[Linear Regression]]