## Definition
The **Conjugate Gradient (CG)** method is an iterative algorithm for solving large symmetric positive-definite linear systems $Ax = b$, and more generally minimising quadratic functions. Reaches the exact solution in at most $n$ iterations (for an $n$-dimensional problem) using only matrix-vector products — no matrix factorisation needed.
## The Setup
For minimising $\frac{1}{2} x^\top A x - b^\top x$ where $A$ is symmetric positive definite:
```
x_0 ← initial guess
r_0 = b - A x_0 (initial residual)
p_0 = r_0 (initial search direction)
for k = 0, 1, ...:
α_k = r_k^T r_k / (p_k^T A p_k)
x_{k+1} = x_k + α_k p_k
r_{k+1} = r_k - α_k A p_k
if ||r_{k+1}|| < tolerance: stop
β_k = (r_{k+1}^T r_{k+1}) / (r_k^T r_k)
p_{k+1} = r_{k+1} + β_k p_k
```
## What "Conjugate" Means
Successive search directions $p_k$ are **A-conjugate**: $p_i^\top A p_j = 0$ for $i \neq j$. This is the key — each step makes progress in a direction orthogonal (under the $A$-inner-product) to all previous steps.
## Convergence
- **Finite termination:** at most $n$ iterations for $n$-dimensional problem.
- **In practice:** convergence depends on the spectrum of $A$. If $A$ has $k$ distinct eigenvalues, CG converges in $k$ iterations.
- **Preconditioning** (replace $A$ with $M^{-1} A$ for some easy-to-invert $M$) dramatically accelerates convergence.
## Why It Matters
Many problems reduce to large symmetric positive-definite linear systems:
- **Finite-element analysis** in engineering.
- **Image deblurring** (Tikhonov regularisation).
- **Least-squares regression** at scale (normal equations).
- **PageRank** and other graph algorithms.
- **Hessian-free optimisation** in deep learning — uses CG to approximately invert the Hessian for Newton-like updates without storing it.
## Non-linear Conjugate Gradient
Extends to non-quadratic objectives:
- Compute gradient $\nabla f(x_k)$.
- Use a Fletcher-Reeves, Polak-Ribière, or Hestenes-Stiefel formula to update $\beta_k$.
- Line search for the step size.
Less popular than [[Quasi-Newton Methods|L-BFGS]] for general non-linear optimisation but useful when memory matters.
## Strengths
- **Memory-efficient.** Only needs vectors, not matrices.
- **Matrix-free.** Only needs to compute $Ax$, not store $A$. Critical when $A$ is implicit (e.g., a Hessian-vector product in a neural network).
- **Theoretical guarantees** for quadratic problems.
## Weaknesses
- **Symmetric positive definite only.** For general systems, use GMRES or other Krylov methods.
- **Preconditioning is crucial** in practice.
- **Numerical stability** issues with floating-point.
## Related
- [[Newton's Method]]
- [[Quasi-Newton Methods]]
- [[Gradient Descent]]