Convex Optimization - Albert Masoliver's learning site

## Definition **Convex optimisation** is the problem of minimising a convex function over a convex set: $ \min_x f(x) \quad \text{s.t.} \quad x \in C $ where $f$ is convex and $C$ is a convex set. The "good" corner of the optimisation universe — efficiently and reliably solvable. ## Why It's Tractable For a convex problem: 1. **Every local optimum is global.** No exploration of multiple basins needed. 2. **First-order conditions are sufficient.** $\nabla f(x^*) = 0$ in the interior; KKT conditions on the boundary. 3. **Polynomial-time algorithms** exist (interior-point methods). 4. **Strong duality** holds — dual gives tight bounds. These properties make convex optimisation a "solved problem" for moderate sizes. ## Standard Convex Problem Classes In increasing generality: - **Linear Programming (LP)** — linear objective and constraints. Most efficient. - **Quadratic Programming (QP)** — quadratic objective (PSD), linear constraints. - **Quadratically-Constrained QP (QCQP)** — quadratic constraints. - **Second-Order Cone Programming (SOCP)** — generalises QCQP. - **Semidefinite Programming (SDP)** — variable is a PSD matrix. - **General convex** — any convex objective on any convex set. Each class subsumes the previous; SDP is the most general, solvable but most expensive. ## Algorithms ### Interior-Point Methods Stay strictly inside the feasibility region by adding a *barrier* function. Decrease the barrier weight progressively; converge to the boundary at the optimum. Polynomial-time complexity. Standard for most convex problems. ### Newton's Method For unconstrained convex problems with available Hessian. Quadratic convergence near the optimum. ### Gradient Descent For very large problems or where Hessian is expensive. First-order; slower convergence but cheaper per step. ### Subgradient Methods For non-differentiable convex objectives (Lasso, hinge loss). Slower but works. ### Proximal Methods For composite objectives $f + g$ with $g$ non-smooth. Each step alternates a gradient step on $f$ and a proximal step for $g$. Foundation of ISTA, FISTA, ADMM. ## DCP — Disciplined Convex Programming Boyd and colleagues' framework (CVX, CVXPY): write a problem in *disciplined* form (built up by combining known-convex operations following rules), and the framework can automatically certify convexity and dispatch to the right solver. ```python import cvxpy as cp x = cp.Variable(n) objective = cp.Minimize(cp.sum_squares(A @ x - b) + lambda_ * cp.norm(x, 1)) problem = cp.Problem(objective) problem.solve() ``` DCP democratised convex optimisation: practitioners specify the problem, not the algorithm. ## Why Most ML Loss Functions Aren't Convex Neural network loss surfaces are non-convex because of nested non-linear activations. The composition of convex functions is *not* convex in general — even with convex losses, the model parameters' loss is non-convex. This is the most consequential trade-off in modern ML: rich expressivity → non-convex training → no global-optimum guarantees → empirically still works at scale. ## Classical Examples That Are Convex - Linear regression. - Logistic regression. - SVM training (in dual form). - Lasso and Ridge regression. - Maximum likelihood for exponential family distributions. - Matrix completion via nuclear-norm relaxation. ## When to Push Toward Convex Whenever possible, *reformulate* a problem to be convex. Even if the original was non-convex, a convex relaxation often gives: - A useful lower bound. - A good initialisation. - Theoretical insights. ## Related - [[Optimization Problem]] - [[Linear Programming]] - [[Lagrangian Multipliers]] - [[KKT Conditions]] - [[Convex vs Non-Convex Optimization]]