Convex vs Non-Convex Optimization - Albert Masoliver's learning site

## Definition A **convex optimisation problem** has a convex objective function and a convex feasibility region. **Non-convex** problems lack one or both. The convex/non-convex divide is the most important distinction in optimisation — it determines whether the problem is *tractable* (convex) or *generally NP-hard* (non-convex). ## Convex Function $f$ is convex iff for all $x, y$ and $\lambda \in [0, 1]$: $ f(\lambda x + (1-\lambda) y) \leq \lambda f(x) + (1-\lambda) f(y) $ Geometric intuition: the line segment between any two points on the graph stays *above* the graph. Equivalently for twice-differentiable functions: Hessian is positive semi-definite everywhere. ## Convex Set A set $C$ is convex iff for all $x, y \in C$ and $\lambda \in [0, 1]$, $\lambda x + (1-\lambda) y \in C$. Line segments between feasible points stay feasible. ## Why Convexity Is Decisive For convex problems: 1. **Every local optimum is global** — gradient descent can't get stuck in a bad local. 2. **Efficient algorithms exist** — interior-point methods solve convex problems in polynomial time. 3. **Duality is tight** — strong duality holds; dual gives exact bounds. 4. **First-order conditions are sufficient** — $\nabla f(x^*) = 0$ in interior; KKT conditions on boundary. Non-convex problems have none of these guarantees. ## Common Convex Problems - **Linear programming** — linear objective, linear constraints. - **Quadratic programming** — convex quadratic objective, linear constraints. - **Second-order cone programming.** - **Semidefinite programming.** - **Logistic regression, SVM training, Lasso, Ridge.** ## Common Non-Convex Problems - **Neural network training.** Almost always non-convex due to nested non-linearities. - **Integer programming.** Non-convex by virtue of discrete decisions. - **K-means clustering.** - **Most combinatorial optimisation problems.** - **Bilinear / multilinear objectives.** ## The Modern ML Paradox Deep learning trains massive non-convex models and works empirically. Why? - **Overparameterisation.** With many more parameters than data, the loss landscape has many global minima, not just one. - **SGD's implicit bias** finds flat minima that generalise well. - **The "loss landscape" is benign** in some structural sense (Choromanska et al., 2015; many follow-ups). A surprising empirical lesson: non-convex isn't always catastrophic if the structure is right. ## Convex Relaxation A common technique: replace the original non-convex problem with a *convex relaxation* — a related convex problem whose solution bounds the original. Examples: - **LP relaxation of ILP.** Drop integer constraints; solve the LP; round. - **Lasso as a convex proxy for $L_0$.** $L_0$ is non-convex, NP-hard; $L_1$ is convex and yields similar solutions. - **Nuclear norm as a convex proxy for matrix rank.** Convex relaxations are workhorses in optimisation and statistical learning theory. ## Related - [[Optimization Problem]] - [[Local vs Global Optimum]] - [[Convex Optimization]] - [[Linear Programming]] - [[Lasso Regression]]