Dynamic Programming - Albert Masoliver's learning site

## Definition **Dynamic Programming (DP)** solves optimisation problems by decomposing them into overlapping subproblems and storing solutions to subproblems for reuse. Coined by Richard Bellman in the 1950s. The foundational technique whenever a problem exhibits **optimal substructure** and **overlapping subproblems**. ## The Two Requirements ### Optimal Substructure The optimal solution to a problem can be constructed from optimal solutions to its subproblems. Examples: - **Shortest path:** the shortest path from A to C through B = (shortest A → B) + (shortest B → C). - **Knapsack:** the optimal selection from $n$ items = either (optimal from $n-1$ items, weight $W$) or (optimal from $n-1$ items, weight $W - w_n$) + value of item $n$. ### Overlapping Subproblems The same subproblems recur many times in the recursive structure. Memoising solutions yields exponential speed-up. ## Two Styles ### Top-Down (Memoised Recursion) ```python @lru_cache def fib(n): if n <= 1: return n return fib(n-1) + fib(n-2) ``` Recursive with caching. Natural to write; clean for problems with sparse subproblem space. ### Bottom-Up (Iterative Tabulation) ```python def fib(n): dp = [0, 1] for i in range(2, n+1): dp.append(dp[i-1] + dp[i-2]) return dp[n] ``` Fill a table from base cases up. More space-efficient; often easier to optimise space (rolling array, e.g., only need the last two values for Fibonacci). ## Canonical DP Problems - **Fibonacci numbers.** Pedagogical workhorse. - **0-1 Knapsack.** $O(nW)$. - **Coin change.** $O(n \cdot \text{amount})$. - **Longest common subsequence.** $O(nm)$. - **Edit distance (Levenshtein).** $O(nm)$. - **Matrix chain multiplication.** $O(n^3)$. - **Shortest paths:** Bellman-Ford ($O(VE)$), Floyd-Warshall ($O(V^3)$). - **Traveling Salesman (Held-Karp).** $O(n^2 2^n)$ — exponential but better than naive $O(n!)$. ## The Bellman Equation For sequential decision problems, the Bellman equation characterises the optimal value function: $ V(s) = \max_a \left[ R(s, a) + \gamma \sum_{s'} P(s' \mid s, a) V(s') \right] $ The foundation of [[Value Iteration]], [[Policy Iteration]], and [[Reinforcement Learning]] more generally. See [[Bellman Equation]]. ## When DP Wins - Problem decomposes into a manageable number of subproblems. - Subproblems repeat — memoisation matters. - State space is bounded. ## When DP Loses - **Curse of dimensionality.** DP table size scales exponentially with state dimensions. - **Continuous state spaces** require function approximation; pure DP doesn't apply. - **Insufficient overlap.** If subproblems rarely repeat, memoisation overhead exceeds the saving. ## Approximate Dynamic Programming For large or continuous state spaces, replace exact DP with function approximation: - **Value function approximation** with linear regression, neural networks. - **Sampled DP** — Monte Carlo estimates of value. - This becomes [[Reinforcement Learning]]. ## Beyond Optimisation DP appears across CS: - **Probabilistic inference.** Forward-backward, Viterbi for HMMs. - **Parsing.** CYK algorithm. - **Bioinformatics.** Sequence alignment (Needleman-Wunsch, Smith-Waterman). - **Statistical mechanics.** Transfer matrix methods. Any problem with the right recurrence structure can use DP. ## Related - [[Bellman Equation]] - [[Value Iteration]] - [[Policy Iteration]] - [[Combinatorial Optimization]] - [[Hidden Markov Model]]