## Definition
**Support Vector Regression (SVR)** is the regression variant of [[Support Vector Machine]]. Instead of fitting a line through the data, SVR fits a line within an $\epsilon$-tube — predictions inside the tube incur no loss; only outliers contribute to the optimisation.
## The $\epsilon$-Insensitive Loss
$
L_\epsilon(y, \hat y) = \begin{cases}
0 & \text{if } |y - \hat y| \leq \epsilon \\
|y - \hat y| - \epsilon & \text{otherwise}
\end{cases}
$
Predictions within $\epsilon$ of the truth are treated as "good enough"; the model only spends capacity correcting bigger errors.
## The Optimisation
$
\min_w \frac{1}{2} \|w\|^2 + C \sum_i (\xi_i + \xi_i^*)
$
subject to $y_i - (w^\top x_i + b) \leq \epsilon + \xi_i$ and $(w^\top x_i + b) - y_i \leq \epsilon + \xi_i^*$, with slack variables $\xi_i, \xi_i^* \geq 0$.
Two hyperparameters:
- $C$ — regularisation; higher = less tolerance for errors.
- $\epsilon$ — tube width; higher = sparser solution (fewer support vectors).
## Kernel Trick
Like SVM classification, SVR uses the [[Kernel Trick]]:
$
\hat y = \sum_i (\alpha_i - \alpha_i^*) K(x_i, x) + b
$
Non-linear regression without explicitly mapping features. Common kernels: RBF, polynomial, linear.
## Strengths
- **Robust to outliers** — the $\epsilon$-tube ignores small errors.
- **Non-linear via kernels** without losing the convex-optimisation guarantees.
- **Sparse solution** — only points outside the $\epsilon$-tube contribute.
## Weaknesses
- **Slow on large datasets.** Training is $O(n^2)$ to $O(n^3)$; doesn't scale beyond ~10k samples comfortably.
- **Hyperparameter sensitive.** $C$, $\epsilon$, kernel parameter ($\gamma$ for RBF) all matter and need tuning.
- **Output not probabilistic** — for uncertainty estimates use Gaussian Process Regression.
## When to Use
- Moderate-size datasets (~1k to 10k samples).
- Non-linear regression where neural networks are overkill.
- Robustness to outliers matters.
## Modern Status
SVR was a workhorse from ~1995 to ~2010. For tabular regression in 2026, gradient-boosted trees ([[XGBoost]]) almost always outperform SVR with less tuning. SVR retains a niche where kernel-based non-linearity + sparse solutions are specifically valuable.
## Related
- [[Support Vector Machine]]
- [[Kernel Trick]]
- [[Linear Regression]]