KKT Conditions - Albert Masoliver's learning site

## Definition The **Karush-Kuhn-Tucker (KKT) conditions** are first-order necessary conditions for optimality in *inequality-constrained* optimisation. They generalise [[Lagrangian Multipliers]] from equality to inequality constraints. For convex problems, KKT conditions are also *sufficient*. ## The Problem $ \min_x f(x) \quad \text{subject to} \quad g_i(x) \leq 0, \, h_j(x) = 0 $ ## The Lagrangian $ \mathcal{L}(x, \mu, \lambda) = f(x) + \sum_i \mu_i g_i(x) + \sum_j \lambda_j h_j(x) $ with $\mu_i \geq 0$ and $\lambda_j \in \mathbb{R}$. ## The KKT Conditions A point $x^*$ is a candidate optimum if there exist $\mu^* \geq 0, \lambda^*$ such that: ### 1. Stationarity $ \nabla_x \mathcal{L}(x^*, \mu^*, \lambda^*) = 0 $ ### 2. Primal feasibility $ g_i(x^*) \leq 0, \quad h_j(x^*) = 0 $ ### 3. Dual feasibility $ \mu_i^* \geq 0 $ ### 4. Complementary slackness $ \mu_i^* \cdot g_i(x^*) = 0 \quad \forall i $ **Interpretation:** for each inequality, either the multiplier is zero (constraint is *inactive* — slack) OR the constraint is tight ($g_i(x^*) = 0$, *active*). Not both can be strict at the same time. ## When KKT Is Sufficient For **convex** problems (convex $f$, convex $g_i$, affine $h_j$) with constraint qualifications, the KKT conditions are sufficient. Any KKT point is a global optimum. For non-convex problems, KKT is only necessary — KKT points include saddle points and local maxima. ## Constraint Qualifications KKT requires some "regularity" of the constraints at the optimum. The standard one (Linear Independence Constraint Qualification — LICQ): gradients of *active* constraints are linearly independent. Without constraint qualifications, the KKT conditions may fail to hold even at the true optimum. ## Worked Example: Quadratic Programming $ \min \frac{1}{2} x^\top Q x + c^\top x \quad \text{s.t.} \quad A x \leq b $ KKT conditions: - Stationarity: $Qx + c + A^\top \mu = 0$. - Primal feasibility: $Ax \leq b$. - Dual feasibility: $\mu \geq 0$. - Complementary slackness: $\mu_i (a_i^\top x - b_i) = 0$. Active-set methods alternate between guessing which constraints are active and solving the resulting equality-constrained problem. ## Connection to SVM The hard-margin SVM dual problem is derived by writing the Lagrangian of the primal and applying KKT. The complementary slackness condition $\alpha_i (y_i (w^\top x_i + b) - 1) = 0$ tells us: - Either $\alpha_i = 0$ (non-support vector) OR - $y_i (w^\top x_i + b) = 1$ (point on the margin — a support vector). This is why SVMs are sparse in support vectors. ## Connection to Linear Programming KKT conditions for LP yield the **complementary slackness** of LP duality: for each primal/dual constraint pair, at most one has slack. ## Related - [[Lagrangian Multipliers]] - [[Duality]] - [[Linear Programming]] - [[Convex Optimization]] - [[Support Vector Machine]]