## Definition
**MSE**, **MAE**, and **RMSE** are the three workhorse loss/error metrics for regression — measuring how far predictions are from true values.
## Definitions
For predictions $\hat y_i$ and true values $y_i$:
$
\text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat y_i)^2
$
$
\text{RMSE} = \sqrt{\text{MSE}} = \sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \hat y_i)^2}
$
$
\text{MAE} = \frac{1}{n} \sum_{i=1}^n |y_i - \hat y_i|
$
## Properties
| Metric | Units | Robust to outliers? | Differentiable everywhere? |
|---|---|---|---|
| MSE | $y^2$ | No (squares amplify outliers) | Yes |
| RMSE | $y$ | No | Yes (except at 0 error) |
| MAE | $y$ | Yes | No (kink at 0) |
## When to Use Which
- **MSE / RMSE** — when large errors should be punished more than small ones. Mathematically convenient (smooth gradient, decomposes into bias² + variance). Default for most regression.
- **MAE** — when you want errors weighted linearly. Robust to outliers. Use when the data has heavy-tailed noise.
- **RMSE over MSE** — RMSE has the same units as the target, easier to interpret. Use MSE during training (smooth loss); report RMSE.
## The Median vs Mean Connection
Surprisingly elegant:
- **MSE** is minimised by predicting the *mean* of the target distribution at each input.
- **MAE** is minimised by predicting the *median*.
So your choice of loss implicitly chooses what kind of central tendency you're estimating.
## Other Useful Variants
### MAPE (Mean Absolute Percentage Error)
$
\text{MAPE} = \frac{100}{n} \sum_{i=1}^n \left| \frac{y_i - \hat y_i}{y_i} \right|
$
Unitless, interpretable as a percentage. Breaks down when $y_i \approx 0$.
### sMAPE (Symmetric MAPE)
Variant that handles small / zero true values more gracefully.
### Huber Loss
Quadratic for small errors, linear for large ones — combines MSE smoothness with MAE robustness. Used heavily in robust regression and reinforcement learning (DQN's loss is Huber).
## Reporting
In a research paper or model card, report **both** an error metric (RMSE or MAE) and a goodness-of-fit metric ([[R-Squared Coefficient]]). The pair tells the full story.
## Related
- [[R-Squared Coefficient]]
- [[Loss Functions]]
- [[Linear Regression]]