## Definition
**Supervised learning** is the paradigm of learning a function $f: X \to Y$ from a labelled dataset $\{(x_i, y_i)\}_{i=1}^n$. The "supervision" is the label $y_i$ — a target the model is told to predict from the input $x_i$.
## Two Sub-Tasks
- **Regression** — output is continuous: house prices, temperatures, sales. See [[Linear Regression]].
- **Classification** — output is categorical: spam/not-spam, disease diagnosis, image category. See [[Logistic Regression]], [[kNN]].
## The Learning Objective
Choose a model $f_\theta$ parameterised by $\theta$ and minimise the expected loss:
$
\theta^* = \arg\min_\theta \mathbb{E}_{(x, y) \sim P} \left[ L(f_\theta(x), y) \right]
$
In practice we minimise *empirical risk* on the training set:
$
\hat\theta = \arg\min_\theta \frac{1}{n} \sum_{i=1}^n L(f_\theta(x_i), y_i) + \Omega(\theta)
$
with $\Omega$ a regularizer ([[L1 Regularization]], [[L2 Regularization]]).
## Common Loss Functions
| Task | Loss |
| ------------------- | ----------------------------------- |
| Regression | Squared error $(y - \hat y)^2$, absolute error $|y - \hat y|$ |
| Binary classification | Cross-entropy / log loss |
| Multi-class classification | Categorical cross-entropy |
| Ranking | Pairwise hinge, listwise losses |
| Imbalanced classes | Focal loss, weighted cross-entropy |
See [[Loss Functions]] for a deeper treatment.
## Model Families
- **Linear** — [[Linear Regression]], [[Logistic Regression]].
- **Distance-based** — [[kNN]], [[Support Vector Machine]].
- **Tree-based** — [[Decision Trees]], [[Random Forest]], [[XGBoost]].
- **Probabilistic** — [[Naive Bayes]].
- **Neural** — see [[9 - Deep Learning Notes Hub]].
## The Generalisation Bargain
The whole game of supervised learning: minimise training loss *while controlling* the gap between training and test performance — the **generalisation gap**. See [[Bias-Variance Tradeoff]], [[Overfitting and Underfitting]].
## Labels: The Bottleneck
In practice, the cost of labelling data often dominates the cost of training models. Strategies to reduce:
- **Active learning** — model selects the most informative examples for labelling.
- **Semi-supervised learning** — combine few labels with many unlabelled examples.
- **[[Self-Supervised Learning]]** — derive labels from the structure of unlabelled data.
- **Synthetic data** — generate labelled examples programmatically.
## Related
- [[Machine Learning]]
- [[Unsupervised Learning]]
- [[Reinforcement Learning]]
- [[Loss Functions]]
- [[Bias-Variance Tradeoff]]