## Definition
A **Hidden Markov Model (HMM)** is a probabilistic model of sequence data: a hidden Markov chain of *states* emits observable *symbols* per time step. The dominant model for speech recognition, biological sequence analysis, and many time-series problems before deep learning displaced it.
## Components
An HMM is the tuple $\langle S, V, A, B, \pi \rangle$:
- $S = \{s_1, \dots, s_N\}$ — set of hidden states.
- $V = \{v_1, \dots, v_M\}$ — observation alphabet.
- $A$ — state transition matrix, $a_{ij} = P(s_j \text{ at } t{+}1 \mid s_i \text{ at } t)$.
- $B$ — emission matrix, $b_j(v_k) = P(v_k \mid s_j)$.
- $\pi$ — initial state distribution.
## The Markov Assumption
The current hidden state depends only on the previous hidden state. Observations are independent given the current state. These two independence assumptions make inference tractable.
## The Three Canonical Problems
### 1. Evaluation: $P(O \mid \lambda)$
Given a model $\lambda = (A, B, \pi)$ and an observation sequence $O$, compute the probability of $O$. Solved by the **forward algorithm** in $O(N^2 T)$ where $T$ is sequence length.
### 2. Decoding: most likely hidden state sequence
Given $O$ and $\lambda$, find the state sequence $S^* = \arg\max_S P(S \mid O, \lambda)$. Solved by the **[[Viterbi Algorithm]]** in $O(N^2 T)$.
### 3. Learning: estimate $\lambda$ from data
Given observations only, learn the parameters. Solved by **Baum-Welch** (a form of [[Expectation-Maximization]]).
## The Forward Algorithm
Dynamic programming over $\alpha_t(i) = P(o_1, \dots, o_t, s_t = i \mid \lambda)$:
$
\alpha_{t+1}(j) = \left[ \sum_i \alpha_t(i) \, a_{ij} \right] b_j(o_{t+1})
$
Linear in $T$, quadratic in $N$.
## Where HMMs Dominated (Pre-2015)
- **Speech recognition.** Until DNN-HMM hybrids and then end-to-end deep models.
- **Part-of-speech tagging** in NLP.
- **Gene finding** in bioinformatics.
- **Activity recognition** from sensor data.
- **Financial regime detection.**
## Connection to Modern Deep Learning
- **Conditional Random Fields (CRFs)** generalise HMMs to discriminative models with arbitrary features.
- **Recurrent neural networks** ([[Recurrent Neural Network]]) replaced HMMs for most sequence-modelling tasks.
- **State-space models** in 2024+ (Mamba, Linear Attention) revisited HMM-style recurrence at frontier scale.
## Limitations
- **Strong Markov assumption.** Long-range dependencies poorly captured.
- **Discrete states** by default. Continuous extensions exist (Linear-Gaussian: Kalman filter).
- **Number of states must be chosen** — infinite-state extensions (HDP-HMM) are heavier.
## Related
- [[Bayesian Network]]
- [[Markov Network]]
- [[Expectation-Maximization]]
- [[Recurrent Neural Network]]
- [[Markov Decision Process]]