Objective Function - Albert Masoliver's learning site

## Definition The **objective function** $f: X \to \mathbb{R}$ is the scalar function being optimised — minimised or maximised — in an optimisation problem. Quantifies what "best" means for a given decision $x$. ## Properties That Shape the Problem ### Convexity A function is **convex** if for all $x, y$ and $\lambda \in [0, 1]$: $ f(\lambda x + (1 - \lambda) y) \leq \lambda f(x) + (1 - \lambda) f(y) $ Convex functions have no local minima other than the global minimum. See [[Convex vs Non-Convex Optimization]]. ### Smoothness - **Differentiable** — gradients exist; gradient descent applies. - **Twice-differentiable** — Hessian exists; Newton's method applies. - **Lipschitz-continuous gradient** ($\|\nabla f(x) - \nabla f(y)\| \leq L \|x - y\|$) — gives convergence rate guarantees. ### Coercivity $f(x) \to \infty$ as $\|x\| \to \infty$. Ensures minimisers exist. ### Unimodality A function with exactly one local minimum (=global minimum). Easy to optimise; rare in practice. ## Common Objectives in ML | Problem | Objective | Properties | |---|---|---| | Linear regression | $\|Xw - y\|^2$ | Convex, smooth, closed-form | | Logistic regression | Cross-entropy | Convex, smooth | | SVM (hinge loss) | $\frac{1}{2}\|w\|^2 + C \sum \max(0, 1 - y_i (w^\top x_i + b))$ | Convex, non-smooth | | Neural network training | Cross-entropy of network output | Non-convex, smooth | | K-means | $\sum_c \sum_{x \in C_c} \|x - \mu_c\|^2$ | Non-convex | Most modern ML loss functions are **non-convex** because of the model's non-linearity. Gradient methods find local optima; we make do. ## Multi-Objective Optimisation When optimising several objectives simultaneously, no single optimum exists — instead, a **Pareto frontier** of non-dominated solutions. Trade-off curves shown by varying weights or constraints. ## Choosing an Objective The objective is the *most consequential* design choice in an ML / OR project: - **Aligns with business goal.** A model trained on MSE optimises for the mean; if you care about the median, use MAE. - **Differentiable** if using gradient-based methods. - **Robust** to outliers if data is noisy. - **Calibrated** if probabilistic outputs matter. Changing the objective changes the model — for better or worse. The reward function in [[Reinforcement Learning]] is the most extreme case: subtle misspecification gives bizarre behaviour. ## Related - [[Optimization Problem]] - [[Convex vs Non-Convex Optimization]] - [[Local vs Global Optimum]] - [[Loss Functions]]