## Definition
The **coefficient of determination ($R^2$)** measures the proportion of variance in the target variable explained by the model. The standard goodness-of-fit metric for regression.
## Formula
$
R^2 = 1 - \frac{\text{SS}_{\text{res}}}{\text{SS}_{\text{tot}}} = 1 - \frac{\sum_i (y_i - \hat y_i)^2}{\sum_i (y_i - \bar y)^2}
$
- $\text{SS}_{\text{res}}$ — sum of squared residuals (model errors).
- $\text{SS}_{\text{tot}}$ — total sum of squares (variance of the target).
## Interpretation
- **$R^2 = 1$** — perfect fit.
- **$R^2 = 0$** — model predicts as well as the mean of $y$.
- **$R^2 < 0$** — possible on held-out data: model is *worse* than predicting the mean. A common sign of underfitting or distribution shift.
## A Common Misconception
$R^2$ is sometimes described as "the correlation squared" — true only for univariate linear regression evaluated on the training set. For other cases — multivariate models, non-linear models, held-out evaluations — $R^2$ measures *explained variance*, not correlation.
## Adjusted $R^2$
Plain $R^2$ never decreases when features are added — even useless ones. **Adjusted $R^2$** penalises model complexity:
$
R^2_{\text{adj}} = 1 - (1 - R^2) \cdot \frac{n - 1}{n - p - 1}
$
with $n$ samples and $p$ features. Use when comparing models with different numbers of features.
## Out-of-Sample $R^2$ Can Be Negative
On the training set, $R^2 \in [0, 1]$. On held-out data, $R^2$ can be negative — the model performs *worse* than just predicting the training mean. Treat negative $R^2$ as a red flag: probable overfitting, wrong feature engineering, or distribution shift.
## When Not to Use $R^2$
- **Non-linear relationships** — $R^2$ still measures linear-style variance explained; can be misleading.
- **Heteroscedastic data** — $R^2$ doesn't reflect varying noise.
- **Time-series with trends** — naive baselines (just-predict-the-trend) can give high $R^2$ for trivial reasons.
In those cases, report MSE/RMSE *and* a domain-appropriate metric.
## Related
- [[MSE MAE RMSE]]
- [[Linear Regression]]
- [[Bias-Variance Tradeoff]]