Support Vector Regression - Albert Masoliver's learning site

## Definition **Support Vector Regression (SVR)** is the regression variant of [[Support Vector Machine]]. Instead of fitting a line through the data, SVR fits a line within an $\epsilon$-tube — predictions inside the tube incur no loss; only outliers contribute to the optimisation. ## The $\epsilon$-Insensitive Loss $ L_\epsilon(y, \hat y) = \begin{cases} 0 & \text{if } |y - \hat y| \leq \epsilon \\ |y - \hat y| - \epsilon & \text{otherwise} \end{cases} $ Predictions within $\epsilon$ of the truth are treated as "good enough"; the model only spends capacity correcting bigger errors. ## The Optimisation $ \min_w \frac{1}{2} \|w\|^2 + C \sum_i (\xi_i + \xi_i^*) $ subject to $y_i - (w^\top x_i + b) \leq \epsilon + \xi_i$ and $(w^\top x_i + b) - y_i \leq \epsilon + \xi_i^*$, with slack variables $\xi_i, \xi_i^* \geq 0$. Two hyperparameters: - $C$ — regularisation; higher = less tolerance for errors. - $\epsilon$ — tube width; higher = sparser solution (fewer support vectors). ## Kernel Trick Like SVM classification, SVR uses the [[Kernel Trick]]: $ \hat y = \sum_i (\alpha_i - \alpha_i^*) K(x_i, x) + b $ Non-linear regression without explicitly mapping features. Common kernels: RBF, polynomial, linear. ## Strengths - **Robust to outliers** — the $\epsilon$-tube ignores small errors. - **Non-linear via kernels** without losing the convex-optimisation guarantees. - **Sparse solution** — only points outside the $\epsilon$-tube contribute. ## Weaknesses - **Slow on large datasets.** Training is $O(n^2)$ to $O(n^3)$; doesn't scale beyond ~10k samples comfortably. - **Hyperparameter sensitive.** $C$, $\epsilon$, kernel parameter ($\gamma$ for RBF) all matter and need tuning. - **Output not probabilistic** — for uncertainty estimates use Gaussian Process Regression. ## When to Use - Moderate-size datasets (~1k to 10k samples). - Non-linear regression where neural networks are overkill. - Robustness to outliers matters. ## Modern Status SVR was a workhorse from ~1995 to ~2010. For tabular regression in 2026, gradient-boosted trees ([[XGBoost]]) almost always outperform SVR with less tuning. SVR retains a niche where kernel-based non-linearity + sparse solutions are specifically valuable. ## Related - [[Support Vector Machine]] - [[Kernel Trick]] - [[Linear Regression]]