Hidden Markov Model - Albert Masoliver's learning site

## Definition A **Hidden Markov Model (HMM)** is a probabilistic model of sequence data: a hidden Markov chain of *states* emits observable *symbols* per time step. The dominant model for speech recognition, biological sequence analysis, and many time-series problems before deep learning displaced it. ## Components An HMM is the tuple $\langle S, V, A, B, \pi \rangle$: - $S = \{s_1, \dots, s_N\}$ — set of hidden states. - $V = \{v_1, \dots, v_M\}$ — observation alphabet. - $A$ — state transition matrix, $a_{ij} = P(s_j \text{ at } t{+}1 \mid s_i \text{ at } t)$. - $B$ — emission matrix, $b_j(v_k) = P(v_k \mid s_j)$. - $\pi$ — initial state distribution. ## The Markov Assumption The current hidden state depends only on the previous hidden state. Observations are independent given the current state. These two independence assumptions make inference tractable. ## The Three Canonical Problems ### 1. Evaluation: $P(O \mid \lambda)$ Given a model $\lambda = (A, B, \pi)$ and an observation sequence $O$, compute the probability of $O$. Solved by the **forward algorithm** in $O(N^2 T)$ where $T$ is sequence length. ### 2. Decoding: most likely hidden state sequence Given $O$ and $\lambda$, find the state sequence $S^* = \arg\max_S P(S \mid O, \lambda)$. Solved by the **[[Viterbi Algorithm]]** in $O(N^2 T)$. ### 3. Learning: estimate $\lambda$ from data Given observations only, learn the parameters. Solved by **Baum-Welch** (a form of [[Expectation-Maximization]]). ## The Forward Algorithm Dynamic programming over $\alpha_t(i) = P(o_1, \dots, o_t, s_t = i \mid \lambda)$: $ \alpha_{t+1}(j) = \left[ \sum_i \alpha_t(i) \, a_{ij} \right] b_j(o_{t+1}) $ Linear in $T$, quadratic in $N$. ## Where HMMs Dominated (Pre-2015) - **Speech recognition.** Until DNN-HMM hybrids and then end-to-end deep models. - **Part-of-speech tagging** in NLP. - **Gene finding** in bioinformatics. - **Activity recognition** from sensor data. - **Financial regime detection.** ## Connection to Modern Deep Learning - **Conditional Random Fields (CRFs)** generalise HMMs to discriminative models with arbitrary features. - **Recurrent neural networks** ([[Recurrent Neural Network]]) replaced HMMs for most sequence-modelling tasks. - **State-space models** in 2024+ (Mamba, Linear Attention) revisited HMM-style recurrence at frontier scale. ## Limitations - **Strong Markov assumption.** Long-range dependencies poorly captured. - **Discrete states** by default. Continuous extensions exist (Linear-Gaussian: Kalman filter). - **Number of states must be chosen** — infinite-state extensions (HDP-HMM) are heavier. ## Related - [[Bayesian Network]] - [[Markov Network]] - [[Expectation-Maximization]] - [[Recurrent Neural Network]] - [[Markov Decision Process]]