Apriori Algorithm - Albert Masoliver's learning site

## Definition The **Apriori algorithm** (Agrawal & Srikant, 1994) is the foundational algorithm for **association rule learning** — discovering frequent itemsets and useful "if X then Y" rules in transactional data. Classic application: market-basket analysis. ## The Key Insight (Apriori Property) > If an itemset is frequent, then all its subsets are frequent. Equivalently: if a subset is not frequent, no superset can be frequent. This *downward closure* enables massive pruning of the candidate space. ## Algorithm ``` L_1 ← frequent 1-itemsets (single items above minimum support) for k = 2, 3, ...: candidates C_k ← join L_{k-1} with itself, prune those containing infrequent (k-1) subsets scan transactions, count support of each candidate L_k ← candidates with support ≥ min_support return ∪_k L_k for each frequent itemset: generate association rules (X → Y) with confidence ≥ min_confidence ``` ## Three Key Metrics For a rule $X \to Y$: - **Support:** $P(X \cup Y)$ — fraction of transactions containing both. - **Confidence:** $P(Y \mid X)$ — fraction of transactions with $X$ that also have $Y$. - **Lift:** $P(Y \mid X) / P(Y)$ — how much more likely $Y$ is given $X$ vs at random. $\text{lift} > 1$ = positive correlation. A useful rule needs high support (occurs often), high confidence (when X, then Y reliably), and lift > 1 (not just because Y is always common). ## Example: Market Basket > {diapers} → {beer}, support 0.05, confidence 0.7, lift 3.5 5% of transactions contain both diapers and beer. 70% of diaper-buying transactions also buy beer. Diaper buyers are 3.5× more likely to buy beer than random shoppers. (The famous data-mining example, though its actual provenance is contested.) ## Limitations - **Multiple database scans.** One scan per level $k$. Slow on large data. - **Combinatorial blow-up** with many items. Even with pruning, intermediate candidate sets can be huge. - **Memory-heavy** for dense data. ## Successors - **FP-Growth** (Han et al., 2000) — uses a compact FP-tree structure; mines without candidate generation. Much faster. - **Eclat** — uses vertical data layout (tid-lists) and intersection. Memory-efficient for sparse data. ## Modern Status Pure association rule mining has largely been displaced by: - **Recommendation systems** (collaborative filtering, matrix factorisation, deep models). - **Sequence pattern mining** for ordered behaviour. But Apriori remains useful for: - Interpretable rule discovery (medical diagnosis, regulatory contexts). - Quick exploratory analysis of categorical transactions. - Foundation for understanding frequent-pattern mining. ## Related - [[Naive Bayes]] - [[Unsupervised Learning]] - [[Bayes Theorem]]