## Definition
The **objective function** $f: X \to \mathbb{R}$ is the scalar function being optimised — minimised or maximised — in an optimisation problem. Quantifies what "best" means for a given decision $x$.
## Properties That Shape the Problem
### Convexity
A function is **convex** if for all $x, y$ and $\lambda \in [0, 1]$:
$
f(\lambda x + (1 - \lambda) y) \leq \lambda f(x) + (1 - \lambda) f(y)
$
Convex functions have no local minima other than the global minimum. See [[Convex vs Non-Convex Optimization]].
### Smoothness
- **Differentiable** — gradients exist; gradient descent applies.
- **Twice-differentiable** — Hessian exists; Newton's method applies.
- **Lipschitz-continuous gradient** ($\|\nabla f(x) - \nabla f(y)\| \leq L \|x - y\|$) — gives convergence rate guarantees.
### Coercivity
$f(x) \to \infty$ as $\|x\| \to \infty$. Ensures minimisers exist.
### Unimodality
A function with exactly one local minimum (=global minimum). Easy to optimise; rare in practice.
## Common Objectives in ML
| Problem | Objective | Properties |
|---|---|---|
| Linear regression | $\|Xw - y\|^2$ | Convex, smooth, closed-form |
| Logistic regression | Cross-entropy | Convex, smooth |
| SVM (hinge loss) | $\frac{1}{2}\|w\|^2 + C \sum \max(0, 1 - y_i (w^\top x_i + b))$ | Convex, non-smooth |
| Neural network training | Cross-entropy of network output | Non-convex, smooth |
| K-means | $\sum_c \sum_{x \in C_c} \|x - \mu_c\|^2$ | Non-convex |
Most modern ML loss functions are **non-convex** because of the model's non-linearity. Gradient methods find local optima; we make do.
## Multi-Objective Optimisation
When optimising several objectives simultaneously, no single optimum exists — instead, a **Pareto frontier** of non-dominated solutions. Trade-off curves shown by varying weights or constraints.
## Choosing an Objective
The objective is the *most consequential* design choice in an ML / OR project:
- **Aligns with business goal.** A model trained on MSE optimises for the mean; if you care about the median, use MAE.
- **Differentiable** if using gradient-based methods.
- **Robust** to outliers if data is noisy.
- **Calibrated** if probabilistic outputs matter.
Changing the objective changes the model — for better or worse. The reward function in [[Reinforcement Learning]] is the most extreme case: subtle misspecification gives bizarre behaviour.
## Related
- [[Optimization Problem]]
- [[Convex vs Non-Convex Optimization]]
- [[Local vs Global Optimum]]
- [[Loss Functions]]