## Definition
**Convex optimisation** is the problem of minimising a convex function over a convex set:
$
\min_x f(x) \quad \text{s.t.} \quad x \in C
$
where $f$ is convex and $C$ is a convex set. The "good" corner of the optimisation universe — efficiently and reliably solvable.
## Why It's Tractable
For a convex problem:
1. **Every local optimum is global.** No exploration of multiple basins needed.
2. **First-order conditions are sufficient.** $\nabla f(x^*) = 0$ in the interior; KKT conditions on the boundary.
3. **Polynomial-time algorithms** exist (interior-point methods).
4. **Strong duality** holds — dual gives tight bounds.
These properties make convex optimisation a "solved problem" for moderate sizes.
## Standard Convex Problem Classes
In increasing generality:
- **Linear Programming (LP)** — linear objective and constraints. Most efficient.
- **Quadratic Programming (QP)** — quadratic objective (PSD), linear constraints.
- **Quadratically-Constrained QP (QCQP)** — quadratic constraints.
- **Second-Order Cone Programming (SOCP)** — generalises QCQP.
- **Semidefinite Programming (SDP)** — variable is a PSD matrix.
- **General convex** — any convex objective on any convex set.
Each class subsumes the previous; SDP is the most general, solvable but most expensive.
## Algorithms
### Interior-Point Methods
Stay strictly inside the feasibility region by adding a *barrier* function. Decrease the barrier weight progressively; converge to the boundary at the optimum.
Polynomial-time complexity. Standard for most convex problems.
### Newton's Method
For unconstrained convex problems with available Hessian. Quadratic convergence near the optimum.
### Gradient Descent
For very large problems or where Hessian is expensive. First-order; slower convergence but cheaper per step.
### Subgradient Methods
For non-differentiable convex objectives (Lasso, hinge loss). Slower but works.
### Proximal Methods
For composite objectives $f + g$ with $g$ non-smooth. Each step alternates a gradient step on $f$ and a proximal step for $g$. Foundation of ISTA, FISTA, ADMM.
## DCP — Disciplined Convex Programming
Boyd and colleagues' framework (CVX, CVXPY): write a problem in *disciplined* form (built up by combining known-convex operations following rules), and the framework can automatically certify convexity and dispatch to the right solver.
```python
import cvxpy as cp
x = cp.Variable(n)
objective = cp.Minimize(cp.sum_squares(A @ x - b) + lambda_ * cp.norm(x, 1))
problem = cp.Problem(objective)
problem.solve()
```
DCP democratised convex optimisation: practitioners specify the problem, not the algorithm.
## Why Most ML Loss Functions Aren't Convex
Neural network loss surfaces are non-convex because of nested non-linear activations. The composition of convex functions is *not* convex in general — even with convex losses, the model parameters' loss is non-convex.
This is the most consequential trade-off in modern ML: rich expressivity → non-convex training → no global-optimum guarantees → empirically still works at scale.
## Classical Examples That Are Convex
- Linear regression.
- Logistic regression.
- SVM training (in dual form).
- Lasso and Ridge regression.
- Maximum likelihood for exponential family distributions.
- Matrix completion via nuclear-norm relaxation.
## When to Push Toward Convex
Whenever possible, *reformulate* a problem to be convex. Even if the original was non-convex, a convex relaxation often gives:
- A useful lower bound.
- A good initialisation.
- Theoretical insights.
## Related
- [[Optimization Problem]]
- [[Linear Programming]]
- [[Lagrangian Multipliers]]
- [[KKT Conditions]]
- [[Convex vs Non-Convex Optimization]]