## Definition
A **convex optimisation problem** has a convex objective function and a convex feasibility region. **Non-convex** problems lack one or both. The convex/non-convex divide is the most important distinction in optimisation — it determines whether the problem is *tractable* (convex) or *generally NP-hard* (non-convex).
## Convex Function
$f$ is convex iff for all $x, y$ and $\lambda \in [0, 1]$:
$
f(\lambda x + (1-\lambda) y) \leq \lambda f(x) + (1-\lambda) f(y)
$
Geometric intuition: the line segment between any two points on the graph stays *above* the graph.
Equivalently for twice-differentiable functions: Hessian is positive semi-definite everywhere.
## Convex Set
A set $C$ is convex iff for all $x, y \in C$ and $\lambda \in [0, 1]$, $\lambda x + (1-\lambda) y \in C$. Line segments between feasible points stay feasible.
## Why Convexity Is Decisive
For convex problems:
1. **Every local optimum is global** — gradient descent can't get stuck in a bad local.
2. **Efficient algorithms exist** — interior-point methods solve convex problems in polynomial time.
3. **Duality is tight** — strong duality holds; dual gives exact bounds.
4. **First-order conditions are sufficient** — $\nabla f(x^*) = 0$ in interior; KKT conditions on boundary.
Non-convex problems have none of these guarantees.
## Common Convex Problems
- **Linear programming** — linear objective, linear constraints.
- **Quadratic programming** — convex quadratic objective, linear constraints.
- **Second-order cone programming.**
- **Semidefinite programming.**
- **Logistic regression, SVM training, Lasso, Ridge.**
## Common Non-Convex Problems
- **Neural network training.** Almost always non-convex due to nested non-linearities.
- **Integer programming.** Non-convex by virtue of discrete decisions.
- **K-means clustering.**
- **Most combinatorial optimisation problems.**
- **Bilinear / multilinear objectives.**
## The Modern ML Paradox
Deep learning trains massive non-convex models and works empirically. Why?
- **Overparameterisation.** With many more parameters than data, the loss landscape has many global minima, not just one.
- **SGD's implicit bias** finds flat minima that generalise well.
- **The "loss landscape" is benign** in some structural sense (Choromanska et al., 2015; many follow-ups).
A surprising empirical lesson: non-convex isn't always catastrophic if the structure is right.
## Convex Relaxation
A common technique: replace the original non-convex problem with a *convex relaxation* — a related convex problem whose solution bounds the original. Examples:
- **LP relaxation of ILP.** Drop integer constraints; solve the LP; round.
- **Lasso as a convex proxy for $L_0$.** $L_0$ is non-convex, NP-hard; $L_1$ is convex and yields similar solutions.
- **Nuclear norm as a convex proxy for matrix rank.**
Convex relaxations are workhorses in optimisation and statistical learning theory.
## Related
- [[Optimization Problem]]
- [[Local vs Global Optimum]]
- [[Convex Optimization]]
- [[Linear Programming]]
- [[Lasso Regression]]