155 Association Rules and Pattern Mining

Association rule mining is the workhorse of unsupervised pattern discovery over transactional data. Given a large collection of records, each a set of items, the goal is to surface regularities of the form “customers who buy diapers also tend to buy beer.” This chapter develops the formal machinery of frequent itemsets, the interestingness measures of support, confidence, and lift, and the two canonical algorithms, Apriori and FP-Growth. We close with the practical realities of market-basket analysis, where these ideas earn their keep.

155.1 1. Problem Setting and Notation

Let $I = \{i_1, i_2, \ldots, i_m\}$ be a set of $m$ distinct items. A transaction $t$ is a subset $t \subseteq I$, and a dataset $D = \{t_1, t_2, \ldots, t_n\}$ is a multiset of $n$ transactions. The canonical example is a supermarket, where each $t_k$ is the set of products in one customer’s basket and $I$ is the store catalog. An itemset $X \subseteq I$ is any set of items, and a $k$-itemset is an itemset of cardinality $k$.

The central counting primitive is the number of transactions that contain a given itemset. Define the cover of $X$ as $\text{cover}(X) = \{t \in D : X \subseteq t\}$, and its absolute frequency as $\sigma(X) = |\text{cover}(X)|$.

This combinatorial framing is deceptively simple. With $m$ items there are $2^m - 1$ candidate non-empty itemsets, so brute force enumeration is hopeless even for a modest catalog of a few thousand SKUs. The art of pattern mining lies in pruning this exponential space without scanning every subset.

155.2 2. Support, Confidence, and Lift

155.2.1 2.1 Support

The support of an itemset $X$ is the fraction of transactions that contain it:

\[ \text{supp}(X) = \frac{\sigma(X)}{n} = \frac{|\{t \in D : X \subseteq t\}|}{n}. \]

Support is an estimate of the joint probability $P(X)$ that a random transaction contains all items in $X$. An itemset is called frequent when its support meets or exceeds a user supplied threshold $\text{minsup}$. The frequent itemset mining problem is to enumerate every $X$ with $\text{supp}(X) \ge \text{minsup}$.

155.2.2 2.2 Association Rules and Confidence

An association rule is an implication $X \Rightarrow Y$ where $X, Y \subseteq I$ and $X \cap Y = \emptyset$. The itemset $X$ is the antecedent and $Y$ the consequent. The rule does not assert logical implication. It asserts a statistical tendency, quantified by two numbers.

The support of the rule is the support of the combined itemset, $\text{supp}(X \Rightarrow Y) = \text{supp}(X \cup Y)$. The confidence is the conditional probability of seeing $Y$ given $X$:

\[ \text{conf}(X \Rightarrow Y) = \frac{\text{supp}(X \cup Y)}{\text{supp}(X)} \approx P(Y \mid X). \]

Confidence measures how often the rule is correct among transactions that already satisfy the antecedent. The classic mining task is to find all rules with support at least $\text{minsup}$ and confidence at least $\text{minconf}$.

155.2.3 2.3 Why Confidence Misleads, and Lift

Confidence alone is a treacherous guide because it ignores the baseline frequency of the consequent. Suppose 75 percent of all customers buy bread. A rule $\{\text{butter}\} \Rightarrow \{\text{bread}\}$ with confidence 0.80 looks strong, yet it barely beats the 0.75 you would get by guessing bread for everyone. Confidence rewards popular consequents regardless of any real association.

Lift corrects for this by comparing the observed co-occurrence against what independence would predict:

\[ \text{lift}(X \Rightarrow Y) = \frac{\text{conf}(X \Rightarrow Y)}{\text{supp}(Y)} = \frac{\text{supp}(X \cup Y)}{\text{supp}(X)\,\text{supp}(Y)} = \frac{P(X, Y)}{P(X)\,P(Y)}. \]

Lift is symmetric in $X$ and $Y$. A value of $1$ means the antecedent and consequent are statistically independent, a value greater than $1$ indicates positive correlation (the items appear together more than chance), and a value less than $1$ indicates substitution or avoidance. In the bread example, $\text{lift} = 0.80 / 0.75 \approx 1.07$, revealing the rule to be nearly worthless.

155.2.4 2.4 Beyond Lift

Lift has its own pathologies. It is unbounded and unstable for rare items, and it is sensitive to the null transactions that contain neither $X$ nor $Y$, inflating apparent associations in sparse data. Practitioners therefore supplement it with measures such as leverage, $\text{supp}(X \cup Y) - \text{supp}(X)\text{supp}(Y)$, which reports the absolute excess over independence; conviction, $\frac{1 - \text{supp}(Y)}{1 - \text{conf}(X \Rightarrow Y)}$, which is directional and penalizes incorrect predictions; and the Kulczynski and imbalance ratio pair, which together resist the null transaction problem. No single number captures interestingness, so reporting several is good practice.

155.3 3. The Apriori Algorithm

155.3.1 3.1 The Downward Closure Principle

The decisive structural fact that tames the exponential search is monotonicity of support, also called the Apriori property or downward closure:

\[ X \subseteq Y \;\Longrightarrow\; \text{supp}(X) \ge \text{supp}(Y). \]

Adding items to an itemset can only shrink its cover. The contrapositive is the engine of pruning: if any itemset is infrequent, then every superset of it is also infrequent and need not be examined. Equivalently, every subset of a frequent itemset is frequent.

155.3.2 3.2 Candidate Generation and Counting

Apriori, introduced by Agrawal and Srikant in 1994, exploits this property with a breadth first, level wise sweep. It builds frequent $k$-itemsets ($L_k$) from frequent $(k-1)$-itemsets, scanning the database once per level.

L_1 = frequent 1-itemsets
for (k = 2; L_{k-1} is not empty; k++):
    C_k = apriori_gen(L_{k-1})        # candidate generation
    for each transaction t in D:      # one database pass
        for each candidate c in C_k that is a subset of t:
            count[c] += 1
    L_k = { c in C_k : count[c]/n >= minsup }
return union of all L_k

The apriori_gen step has two parts. The join step merges two frequent $(k-1)$-itemsets that share their first $k-2$ items, producing a candidate $k$-itemset. The prune step immediately discards any candidate having a $(k-1)$-subset that is not in $L_{k-1}$, since downward closure guarantees such a candidate cannot be frequent. This pruning is what keeps the candidate set manageable.

155.3.3 3.3 From Frequent Itemsets to Rules

Frequent itemset mining and rule generation are separate phases. Once all frequent itemsets are known, rules are extracted from each frequent itemset $Z$ by partitioning it into $X$ and $Z \setminus X$:

\[ \text{for each } X \subset Z, \quad \text{emit } X \Rightarrow (Z \setminus X) \text{ if } \frac{\text{supp}(Z)}{\text{supp}(X)} \ge \text{minconf}. \]

Confidence is itself monotone in the antecedent. If $X \Rightarrow (Z \setminus X)$ fails the confidence test, then any rule with a smaller antecedent (and correspondingly larger consequent) also fails, so the rule search within each itemset can prune analogously.

155.3.4 3.4 Cost and Limitations

Apriori’s appeal is its simplicity and its provably complete pruning. Its weaknesses are equally clear. It performs a full database pass for each level, so a dataset whose longest frequent itemset has length $\ell$ requires up to $\ell$ scans, which is costly on disk resident data. It can also generate enormous candidate sets. With $10^4$ frequent 1-itemsets, the join step alone can yield on the order of $10^7$ candidate 2-itemsets, and each must be counted. These two costs, repeated I/O and candidate explosion, motivated the next generation of algorithms.

155.4 4. The FP-Growth Algorithm

155.4.1 4.1 Motivation

FP-Growth, proposed by Han, Pei, and Yin in 2000, mines frequent itemsets without candidate generation and with only two database scans. Its strategy is to compress the dataset into a compact tree that retains all itemset frequency information, then mine that tree recursively by a divide and conquer pattern growth.

155.4.2 4.2 Building the FP-Tree

The first scan counts single item supports and discards infrequent items. The surviving frequent items are sorted by descending support into a global order $F$. The second scan inserts each transaction into the FP-tree after reordering its items according to $F$ and dropping the infrequent ones.

The FP-tree is a prefix tree. Each node stores an item, a count, and a link. Transactions that share a frequent prefix share the same path from the root, and the node count records how many transactions pass through. A header table maintains, for each item, the head of a linked list threading together every node that holds that item, enabling fast traversal during mining.

Scan 1: count items, keep those with supp >= minsup, sort by support -> F
Create root (null)
Scan 2:
    for each transaction t:
        keep frequent items of t, sort them by F order
        insert the sorted list into the tree, incrementing node counts,
        creating nodes as needed and updating the header-table links

When transactions overlap heavily, the tree is far smaller than the raw data, which is the source of FP-Growth’s efficiency. In the worst case of no shared prefixes the tree can be as large as the data, but real transactional datasets are usually compressible.

155.4.3 4.3 Mining by Pattern Growth

Mining proceeds from the least frequent item upward. For a target item $a$, follow its header link to collect every path from $a$ back to the root. These paths, with their counts, form $a$’s conditional pattern base. From that base a smaller conditional FP-tree is built containing only items that remain frequent in the context of $a$. The algorithm then recurses on this conditional tree, growing the pattern $\{a\}$ by appending items found frequent within it.

FP-growth(Tree, alpha):
    if Tree is a single path P:
        for each subset beta of nodes in P:
            output pattern (beta union alpha) with support = min count in beta
    else:
        for each item a in header table (low to high support):
            output pattern (a union alpha) with support = a.support
            build conditional pattern base for a
            construct conditional FP-tree from that base
            if the conditional tree is non-empty:
                FP-growth(conditional_tree, a union alpha)

The single path optimization is important. When a conditional tree degenerates to one path, every combination of its nodes is frequent and can be enumerated directly without further recursion.

155.4.4 4.4 Trade-offs

FP-Growth typically outperforms Apriori by an order of magnitude on dense data, since it avoids candidate generation entirely and reads the database only twice. The price is memory. The FP-tree and the cascade of conditional trees must reside in memory, which can be prohibitive when the tree fails to compress, for example under very low support thresholds on wide, sparse data. Apriori, being level wise and disk friendly, can remain competitive there. Other algorithms, notably Eclat, take a third route by representing each itemset as the vertical list of transaction IDs in its cover (a tidset) and computing supports through set intersection, which suits certain data shapes well.

155.5 5. A Worked Derivation: Why Downward Closure Holds

The pruning that makes Apriori and FP-Growth tractable rests entirely on the monotonicity of support. It is worth proving rather than asserting, because the same argument extends to confidence and to the condensed representations of Section 7.3.

Claim. For itemsets $X \subseteq Y \subseteq I$, $\sigma(X) \ge \sigma(Y)$, hence $\text{supp}(X) \ge \text{supp}(Y)$.

Proof. Recall $\text{cover}(X) = \{t \in D : X \subseteq t\}$. Take any transaction $t \in \text{cover}(Y)$. By definition $Y \subseteq t$, and since $X \subseteq Y$ by hypothesis, transitivity of set inclusion gives $X \subseteq t$, so $t \in \text{cover}(X)$. Therefore $\text{cover}(Y) \subseteq \text{cover}(X)$, and taking cardinalities of nested finite sets, $\sigma(Y) = |\text{cover}(Y)| \le |\text{cover}(X)| = \sigma(X)$. Dividing both sides by the constant $n = |D|$ preserves the inequality, yielding $\text{supp}(Y) \le \text{supp}(X)$. $\blacksquare$

The contrapositive is the operational form. If $\text{supp}(X) < \text{minsup}$ then for every superset $Y \supseteq X$ we have $\text{supp}(Y) \le \text{supp}(X) < \text{minsup}$, so $Y$ is infrequent and need never be counted. This is exactly the prune step: a $k$-candidate is discarded the moment any of its $(k-1)$-subsets is found infrequent.

Confidence is monotone in the antecedent. Fix a frequent itemset $Z$ and consider two splits with antecedents $A \subset A' \subset Z$ (so $A' \supseteq A$ keeps more of $Z$ in the antecedent). Because $A \subseteq A'$, downward closure gives $\text{supp}(A) \ge \text{supp}(A')$, hence

\[ \text{conf}(A \Rightarrow Z \setminus A) = \frac{\text{supp}(Z)}{\text{supp}(A)} \le \frac{\text{supp}(Z)}{\text{supp}(A')} = \text{conf}(A' \Rightarrow Z \setminus A'). \]

So moving items from the consequent into the antecedent can only raise confidence. Equivalently, if a rule with antecedent $A'$ fails the confidence test, every rule whose antecedent is a subset of $A'$ (with the complementary, larger consequent) also fails. Rule generation within each frequent itemset can therefore prune from large antecedents downward, mirroring the itemset-level pruning.

A note on the count threshold. Comparing real-valued supports to minsup invites floating-point boundary errors: an itemset with exactly $\text{minsup} \cdot n$ occurrences can land just below the threshold after rounding. The implementations below sidestep this by converting minsup to an integer count once, $c^\star = \max(1, \lceil \text{minsup} \cdot n - \varepsilon \rceil)$ with a tiny $\varepsilon$, and comparing integer counts $\sigma(X) \ge c^\star$. This makes Apriori, FP-Growth, and the three language implementations agree exactly.

155.6 6. Reference Implementation

The companion library ships Apriori, FP-Growth, and rule extraction in Python, Julia, and Rust, validated against a shared fixture so all three agree to the last bit. The example below is the classic nine-transaction market basket from Han, Kamber, and Pei over items $\{1, 2, 3, 4, 5\}$ with $\text{minsup} = 2/9$. FP-Growth returns exactly the same frequent set as Apriori, which is the strongest correctness check available.

Code

from aiinaction.ch150_association_rules import (
    apriori,
    fpgrowth,
    association_rules,
)

transactions = [
    [1, 2, 5], [2, 4], [2, 3], [1, 2, 4], [1, 3],
    [2, 3], [1, 3], [1, 2, 3, 5], [1, 2, 3],
]

itemsets = apriori(transactions, min_support=2 / 9)
fg = fpgrowth(transactions, min_support=2 / 9)
print("Apriori and FP-Growth agree:", sorted(itemsets) == sorted(fg))
print(f"{len(itemsets)} frequent itemsets")

for z in sorted(itemsets, key=lambda x: (len(x), x)):
    print(f"  {set(z)}: support {itemsets[z]:.3f}")

print("\nHigh-confidence rules (conf >= 0.7):")
for r in association_rules(itemsets, min_confidence=0.7):
    print(
        f"  {set(r.antecedent)} => {set(r.consequent)} | "
        f"conf={r.confidence:.2f} lift={r.lift:.2f} "
        f"leverage={r.leverage:+.3f}"
    )

Apriori and FP-Growth agree: True
13 frequent itemsets
  {1}: support 0.667
  {2}: support 0.778
  {3}: support 0.667
  {4}: support 0.222
  {5}: support 0.222
  {1, 2}: support 0.444
  {1, 3}: support 0.444
  {1, 5}: support 0.222
  {2, 3}: support 0.444
  {2, 4}: support 0.222
  {2, 5}: support 0.222
  {1, 2, 3}: support 0.222
  {1, 2, 5}: support 0.222

High-confidence rules (conf >= 0.7):
  {1, 5} => {2} | conf=1.00 lift=1.29 leverage=+0.049
  {2, 5} => {1} | conf=1.00 lift=1.50 leverage=+0.074
  {4} => {2} | conf=1.00 lift=1.29 leverage=+0.049
  {5} => {1} | conf=1.00 lift=1.50 leverage=+0.074
  {5} => {1, 2} | conf=1.00 lift=2.25 leverage=+0.123
  {5} => {2} | conf=1.00 lift=1.29 leverage=+0.049

using AIInAction.Ch150AssociationRules

transactions = [
    [1, 2, 5], [2, 4], [2, 3], [1, 2, 4], [1, 3],
    [2, 3], [1, 3], [1, 2, 3, 5], [1, 2, 3],
]

itemsets = apriori(transactions, 2 / 9)
fg = fpgrowth(transactions, 2 / 9)
println("Apriori and FP-Growth agree: ",
        sort(collect(keys(itemsets))) == sort(collect(keys(fg))))
println(length(itemsets), " frequent itemsets")

for r in association_rules(itemsets, 0.7)
    println("  ", Set(r.antecedent), " => ", Set(r.consequent),
            " | conf=", round(r.confidence; digits = 2),
            " lift=", round(r.lift; digits = 2))
end

use aiinaction::ch150_association_rules::{apriori, fpgrowth, association_rules};

fn main() {
    let transactions = vec![
        vec![1, 2, 5], vec![2, 4], vec![2, 3], vec![1, 2, 4], vec![1, 3],
        vec![2, 3], vec![1, 3], vec![1, 2, 3, 5], vec![1, 2, 3],
    ];

    let itemsets = apriori(&transactions, 2.0 / 9.0).unwrap();
    let fg = fpgrowth(&transactions, 2.0 / 9.0).unwrap();
    let agree: Vec<_> = itemsets.keys().collect();
    let agree_fg: Vec<_> = fg.keys().collect();
    println!("Apriori and FP-Growth agree: {}", agree == agree_fg);
    println!("{} frequent itemsets", itemsets.len());

    for r in association_rules(&itemsets, 0.7).unwrap() {
        println!(
            "  {:?} => {:?} | conf={:.2} lift={:.2}",
            r.antecedent, r.consequent, r.confidence, r.lift
        );
    }
}

155.7 7. Market-Basket Analysis in Practice

155.7.1 7.1 The Workflow

Market-basket analysis is the original and still dominant application. The pipeline is direct. Transform point of sale logs into a transaction list, choose support and confidence thresholds, mine frequent itemsets and rules, then rank and filter by lift and the auxiliary measures of Section 2.4. A minimal scikit style invocation looks like this.

from mlxtend.frequent_patterns import fpgrowth, association_rules

# df is a one-hot transaction matrix: rows = baskets, cols = items
itemsets = fpgrowth(df, min_support=0.01, use_colnames=True)
rules = association_rules(itemsets, metric="lift", min_threshold=1.2)
strong = rules[(rules.confidence > 0.5) & (rules.lift > 1.5)]

155.7.2 7.2 Choosing Thresholds

Threshold selection is the practitioner’s central lever. Set $\text{minsup}$ too high and you miss niche but lucrative associations. Set it too low and the algorithm drowns in spurious patterns and may exhaust memory. A common heuristic is to pick the lowest support that keeps runtime and result counts tractable, then rely on lift and confidence to filter. Because item popularity is heavily skewed, a single global $\text{minsup}$ is often wrong for both head and tail items. The multiple minimum supports framework assigns item specific thresholds so that rare but meaningful items are not pruned away by a threshold tuned for common ones.

155.7.3 7.3 Interpretation and Action

The output of mining is a hypothesis generator, not a conclusion. Three caveats matter. First, correlation is not causation. The famous beer and diapers association, whatever its provenance, suggests a placement experiment, not a causal claim. Second, rules can be redundant. If $\{A\} \Rightarrow \{B\}$ and $\{A, C\} \Rightarrow \{B\}$ have the same confidence, the more specific rule adds nothing, and post processing should suppress such redundancy, often by restricting attention to closed or maximal frequent itemsets. An itemset is closed when no superset has the same support, and maximal when no superset is frequent. These condensed representations dramatically shrink the result set without losing information about which itemsets are frequent.

Third, statistical significance deserves scrutiny. With millions of candidate rules, some will clear any lift threshold by chance alone, a multiple testing problem. Validation on a holdout period, or formal significance testing, guards against acting on noise.

155.7.4 7.4 Beyond the Supermarket

The transactional abstraction generalizes far past retail. Web usage mining treats page views in a session as a basket. Recommender systems use co-occurrence rules as a transparent, cold start friendly complement to collaborative filtering. In bioinformatics, items are genes or mutations and transactions are samples. Intrusion detection mines frequent patterns of network events. Healthcare analytics finds co-occurring diagnoses and prescriptions. In each case the same triad of support, confidence, and lift, computed by Apriori, FP-Growth, or their descendants, turns a pile of set valued records into ranked, human readable rules.

155.8 8. Summary

Frequent itemset mining and association rules provide an interpretable, unsupervised lens on co-occurrence structure. Support quantifies prevalence, confidence quantifies conditional reliability, and lift corrects confidence for baseline frequency, with leverage and conviction filling remaining gaps. The downward closure property is the theoretical key that makes the exponential itemset lattice searchable. Apriori realizes it through level wise candidate generation and Boolean pruning, while FP-Growth sidesteps candidate generation with a compressed prefix tree mined by recursive pattern growth. Both feed market-basket analysis, where careful threshold selection, redundancy removal, and skepticism about causation separate actionable insight from statistical mirage.

155.9 References

Agrawal, R., Imielinski, T., and Swami, A. (1993). Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD. https://dl.acm.org/doi/10.1145/170035.170072
Agrawal, R., and Srikant, R. (1994). Fast Algorithms for Mining Association Rules. VLDB. https://www.vldb.org/conf/1994/P487.PDF
Han, J., Pei, J., and Yin, Y. (2000). Mining Frequent Patterns without Candidate Generation. ACM SIGMOD. https://dl.acm.org/doi/10.1145/342009.335372
Zaki, M. J. (2000). Scalable Algorithms for Association Mining. IEEE TKDE. https://ieeexplore.ieee.org/document/846291
Tan, P. N., Steinbach, M., Karpatne, A., and Kumar, V. (2018). Introduction to Data Mining, 2nd ed. Pearson. https://www-users.cse.umn.edu/~kumar001/dmbook/index.php
Han, J., Kamber, M., and Pei, J. (2011). Data Mining: Concepts and Techniques, 3rd ed. Morgan Kaufmann. https://www.elsevier.com/books/data-mining-concepts-and-techniques/han/978-0-12-381479-1
Brin, S., Motwani, R., Ullman, J. D., and Tsur, S. (1997). Dynamic Itemset Counting and Implication Rules for Market Basket Data. ACM SIGMOD. https://dl.acm.org/doi/10.1145/253260.253325
Liu, B., Hsu, W., and Ma, Y. (1999). Mining Association Rules with Multiple Minimum Supports. ACM SIGKDD. https://dl.acm.org/doi/10.1145/312129.312274
mlxtend documentation. Frequent Patterns. https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/

# Association Rules and Pattern Mining Association rule mining is the workhorse of unsupervised pattern discovery over transactional data. Given a large collection of records, each a set of items, the goal is to surface regularities of the form "customers who buy diapers also tend to buy beer." This chapter develops the formal machinery of frequent itemsets, the interestingness measures of support, confidence, and lift, and the two canonical algorithms, Apriori and FP-Growth. We close with the practical realities of market-basket analysis, where these ideas earn their keep. ## 1. Problem Setting and Notation Let $I = \{i_1, i_2, \ldots, i_m\}$ be a set of $m$ distinct **items**. A **transaction** $t$ is a subset $t \subseteq I$, and a **dataset** $D = \{t_1, t_2, \ldots, t_n\}$ is a multiset of $n$ transactions. The canonical example is a supermarket, where each $t_k$ is the set of products in one customer's basket and $I$ is the store catalog. An **itemset** $X \subseteq I$ is any set of items, and a **$k$-itemset** is an itemset of cardinality $k$. The central counting primitive is the number of transactions that contain a given itemset. Define the **cover** of $X$ as $\text{cover}(X) = \{t \in D : X \subseteq t\}$, and its absolute frequency as $\sigma(X) = |\text{cover}(X)|$. This combinatorial framing is deceptively simple. With $m$ items there are $2^m - 1$ candidate non-empty itemsets, so brute force enumeration is hopeless even for a modest catalog of a few thousand SKUs. The art of pattern mining lies in pruning this exponential space without scanning every subset. ## 2. Support, Confidence, and Lift ### 2.1 Support The **support** of an itemset $X$ is the fraction of transactions that contain it: $$ \text{supp}(X) = \frac{\sigma(X)}{n} = \frac{|\{t \in D : X \subseteq t\}|}{n}. $$ Support is an estimate of the joint probability $P(X)$ that a random transaction contains all items in $X$. An itemset is called **frequent** when its support meets or exceeds a user supplied threshold $\text{minsup}$. The frequent itemset mining problem is to enumerate every $X$ with $\text{supp}(X) \ge \text{minsup}$. ### 2.2 Association Rules and Confidence An **association rule** is an implication $X \Rightarrow Y$ where $X, Y \subseteq I$ and $X \cap Y = \emptyset$. The itemset $X$ is the **antecedent** and $Y$ the **consequent**. The rule does not assert logical implication. It asserts a statistical tendency, quantified by two numbers. The support of the rule is the support of the combined itemset, $\text{supp}(X \Rightarrow Y) = \text{supp}(X \cup Y)$. The **confidence** is the conditional probability of seeing $Y$ given $X$: $$ \text{conf}(X \Rightarrow Y) = \frac{\text{supp}(X \cup Y)}{\text{supp}(X)} \approx P(Y \mid X). $$ Confidence measures how often the rule is correct among transactions that already satisfy the antecedent. The classic mining task is to find all rules with support at least $\text{minsup}$ and confidence at least $\text{minconf}$. ### 2.3 Why Confidence Misleads, and Lift Confidence alone is a treacherous guide because it ignores the baseline frequency of the consequent. Suppose 75 percent of all customers buy bread. A rule $\{\text{butter}\} \Rightarrow \{\text{bread}\}$ with confidence 0.80 looks strong, yet it barely beats the 0.75 you would get by guessing bread for everyone. Confidence rewards popular consequents regardless of any real association. **Lift** corrects for this by comparing the observed co-occurrence against what independence would predict: $$ \text{lift}(X \Rightarrow Y) = \frac{\text{conf}(X \Rightarrow Y)}{\text{supp}(Y)} = \frac{\text{supp}(X \cup Y)}{\text{supp}(X)\,\text{supp}(Y)} = \frac{P(X, Y)}{P(X)\,P(Y)}. $$ Lift is symmetric in $X$ and $Y$. A value of $1$ means the antecedent and consequent are statistically independent, a value greater than $1$ indicates positive correlation (the items appear together more than chance), and a value less than $1$ indicates substitution or avoidance. In the bread example, $\text{lift} = 0.80 / 0.75 \approx 1.07$, revealing the rule to be nearly worthless. ### 2.4 Beyond Lift Lift has its own pathologies. It is unbounded and unstable for rare items, and it is sensitive to the **null transactions** that contain neither $X$ nor $Y$, inflating apparent associations in sparse data. Practitioners therefore supplement it with measures such as **leverage**, $\text{supp}(X \cup Y) - \text{supp}(X)\text{supp}(Y)$, which reports the absolute excess over independence; **conviction**, $\frac{1 - \text{supp}(Y)}{1 - \text{conf}(X \Rightarrow Y)}$, which is directional and penalizes incorrect predictions; and the **Kulczynski** and **imbalance ratio** pair, which together resist the null transaction problem. No single number captures interestingness, so reporting several is good practice. ## 3. The Apriori Algorithm ### 3.1 The Downward Closure Principle The decisive structural fact that tames the exponential search is **monotonicity of support**, also called the **Apriori property** or downward closure: $$ X \subseteq Y \;\Longrightarrow\; \text{supp}(X) \ge \text{supp}(Y). $$ Adding items to an itemset can only shrink its cover. The contrapositive is the engine of pruning: if any itemset is infrequent, then every superset of it is also infrequent and need not be examined. Equivalently, every subset of a frequent itemset is frequent. ### 3.2 Candidate Generation and Counting Apriori, introduced by Agrawal and Srikant in 1994, exploits this property with a breadth first, level wise sweep. It builds frequent $k$-itemsets ($L_k$) from frequent $(k-1)$-itemsets, scanning the database once per level. ```text L_1 = frequent 1-itemsets for (k = 2; L_{k-1} is not empty; k++): C_k = apriori_gen(L_{k-1}) # candidate generation for each transaction t in D: # one database pass for each candidate c in C_k that is a subset of t: count[c] += 1 L_k = { c in C_k : count[c]/n >= minsup } return union of all L_k ``` The `apriori_gen` step has two parts. The **join** step merges two frequent $(k-1)$-itemsets that share their first $k-2$ items, producing a candidate $k$-itemset. The **prune** step immediately discards any candidate having a $(k-1)$-subset that is not in $L_{k-1}$, since downward closure guarantees such a candidate cannot be frequent. This pruning is what keeps the candidate set manageable. ### 3.3 From Frequent Itemsets to Rules Frequent itemset mining and rule generation are separate phases. Once all frequent itemsets are known, rules are extracted from each frequent itemset $Z$ by partitioning it into $X$ and $Z \setminus X$: $$ \text{for each } X \subset Z, \quad \text{emit } X \Rightarrow (Z \setminus X) \text{ if } \frac{\text{supp}(Z)}{\text{supp}(X)} \ge \text{minconf}. $$ Confidence is itself monotone in the antecedent. If $X \Rightarrow (Z \setminus X)$ fails the confidence test, then any rule with a smaller antecedent (and correspondingly larger consequent) also fails, so the rule search within each itemset can prune analogously. ### 3.4 Cost and Limitations Apriori's appeal is its simplicity and its provably complete pruning. Its weaknesses are equally clear. It performs a full database pass for each level, so a dataset whose longest frequent itemset has length $\ell$ requires up to $\ell$ scans, which is costly on disk resident data. It can also generate enormous candidate sets. With $10^4$ frequent 1-itemsets, the join step alone can yield on the order of $10^7$ candidate 2-itemsets, and each must be counted. These two costs, repeated I/O and candidate explosion, motivated the next generation of algorithms. ## 4. The FP-Growth Algorithm ### 4.1 Motivation FP-Growth, proposed by Han, Pei, and Yin in 2000, mines frequent itemsets **without candidate generation** and with only **two database scans**. Its strategy is to compress the dataset into a compact tree that retains all itemset frequency information, then mine that tree recursively by a divide and conquer pattern growth. ### 4.2 Building the FP-Tree The first scan counts single item supports and discards infrequent items. The surviving frequent items are sorted by descending support into a global order $F$. The second scan inserts each transaction into the **FP-tree** after reordering its items according to $F$ and dropping the infrequent ones. The FP-tree is a prefix tree. Each node stores an item, a count, and a link. Transactions that share a frequent prefix share the same path from the root, and the node count records how many transactions pass through. A **header table** maintains, for each item, the head of a linked list threading together every node that holds that item, enabling fast traversal during mining. ```text Scan 1: count items, keep those with supp >= minsup, sort by support -> F Create root (null) Scan 2: for each transaction t: keep frequent items of t, sort them by F order insert the sorted list into the tree, incrementing node counts, creating nodes as needed and updating the header-table links ``` When transactions overlap heavily, the tree is far smaller than the raw data, which is the source of FP-Growth's efficiency. In the worst case of no shared prefixes the tree can be as large as the data, but real transactional datasets are usually compressible. ### 4.3 Mining by Pattern Growth Mining proceeds from the least frequent item upward. For a target item $a$, follow its header link to collect every path from $a$ back to the root. These paths, with their counts, form $a$'s **conditional pattern base**. From that base a smaller **conditional FP-tree** is built containing only items that remain frequent in the context of $a$. The algorithm then recurses on this conditional tree, growing the pattern $\{a\}$ by appending items found frequent within it. ```text FP-growth(Tree, alpha): if Tree is a single path P: for each subset beta of nodes in P: output pattern (beta union alpha) with support = min count in beta else: for each item a in header table (low to high support): output pattern (a union alpha) with support = a.support build conditional pattern base for a construct conditional FP-tree from that base if the conditional tree is non-empty: FP-growth(conditional_tree, a union alpha) ``` The single path optimization is important. When a conditional tree degenerates to one path, every combination of its nodes is frequent and can be enumerated directly without further recursion. ### 4.4 Trade-offs FP-Growth typically outperforms Apriori by an order of magnitude on dense data, since it avoids candidate generation entirely and reads the database only twice. The price is memory. The FP-tree and the cascade of conditional trees must reside in memory, which can be prohibitive when the tree fails to compress, for example under very low support thresholds on wide, sparse data. Apriori, being level wise and disk friendly, can remain competitive there. Other algorithms, notably **Eclat**, take a third route by representing each itemset as the vertical list of transaction IDs in its cover (a **tidset**) and computing supports through set intersection, which suits certain data shapes well. ## 5. A Worked Derivation: Why Downward Closure Holds The pruning that makes Apriori and FP-Growth tractable rests entirely on the monotonicity of support. It is worth proving rather than asserting, because the same argument extends to confidence and to the condensed representations of Section 7.3. **Claim.** For itemsets $X \subseteq Y \subseteq I$, $\sigma(X) \ge \sigma(Y)$, hence $\text{supp}(X) \ge \text{supp}(Y)$. **Proof.** Recall $\text{cover}(X) = \{t \in D : X \subseteq t\}$. Take any transaction $t \in \text{cover}(Y)$. By definition $Y \subseteq t$, and since $X \subseteq Y$ by hypothesis, transitivity of set inclusion gives $X \subseteq t$, so $t \in \text{cover}(X)$. Therefore $\text{cover}(Y) \subseteq \text{cover}(X)$, and taking cardinalities of nested finite sets, $\sigma(Y) = |\text{cover}(Y)| \le |\text{cover}(X)| = \sigma(X)$. Dividing both sides by the constant $n = |D|$ preserves the inequality, yielding $\text{supp}(Y) \le \text{supp}(X)$. $\blacksquare$ The contrapositive is the operational form. If $\text{supp}(X) < \text{minsup}$ then for every superset $Y \supseteq X$ we have $\text{supp}(Y) \le \text{supp}(X) < \text{minsup}$, so $Y$ is infrequent and need never be counted. This is exactly the prune step: a $k$-candidate is discarded the moment any of its $(k-1)$-subsets is found infrequent. **Confidence is monotone in the antecedent.** Fix a frequent itemset $Z$ and consider two splits with antecedents $A \subset A' \subset Z$ (so $A' \supseteq A$ keeps more of $Z$ in the antecedent). Because $A \subseteq A'$, downward closure gives $\text{supp}(A) \ge \text{supp}(A')$, hence $$ \text{conf}(A \Rightarrow Z \setminus A) = \frac{\text{supp}(Z)}{\text{supp}(A)} \le \frac{\text{supp}(Z)}{\text{supp}(A')} = \text{conf}(A' \Rightarrow Z \setminus A'). $$ So moving items from the consequent into the antecedent can only raise confidence. Equivalently, if a rule with antecedent $A'$ fails the confidence test, every rule whose antecedent is a subset of $A'$ (with the complementary, larger consequent) also fails. Rule generation within each frequent itemset can therefore prune from large antecedents downward, mirroring the itemset-level pruning. **A note on the count threshold.** Comparing real-valued supports to `minsup` invites floating-point boundary errors: an itemset with exactly $\text{minsup} \cdot n$ occurrences can land just below the threshold after rounding. The implementations below sidestep this by converting `minsup` to an integer count once, $c^\star = \max(1, \lceil \text{minsup} \cdot n - \varepsilon \rceil)$ with a tiny $\varepsilon$, and comparing integer counts $\sigma(X) \ge c^\star$. This makes Apriori, FP-Growth, and the three language implementations agree exactly. ## 6. Reference Implementation The companion library ships Apriori, FP-Growth, and rule extraction in Python, Julia, and Rust, validated against a shared fixture so all three agree to the last bit. The example below is the classic nine-transaction market basket from Han, Kamber, and Pei over items $\{1, 2, 3, 4, 5\}$ with $\text{minsup} = 2/9$. FP-Growth returns exactly the same frequent set as Apriori, which is the strongest correctness check available. ::: {.panel-tabset} ## Python ```{python} from aiinaction.ch150_association_rules import ( apriori, fpgrowth, association_rules, ) transactions = [ [1, 2, 5], [2, 4], [2, 3], [1, 2, 4], [1, 3], [2, 3], [1, 3], [1, 2, 3, 5], [1, 2, 3], ] itemsets = apriori(transactions, min_support=2 / 9) fg = fpgrowth(transactions, min_support=2 / 9) print("Apriori and FP-Growth agree:", sorted(itemsets) == sorted(fg)) print(f"{len(itemsets)} frequent itemsets") for z in sorted(itemsets, key=lambda x: (len(x), x)): print(f" {set(z)}: support {itemsets[z]:.3f}") print("\nHigh-confidence rules (conf >= 0.7):") for r in association_rules(itemsets, min_confidence=0.7): print( f" {set(r.antecedent)} => {set(r.consequent)} | " f"conf={r.confidence:.2f} lift={r.lift:.2f} " f"leverage={r.leverage:+.3f}" ) ``` ## Julia ```julia using AIInAction.Ch150AssociationRules transactions = [ [1, 2, 5], [2, 4], [2, 3], [1, 2, 4], [1, 3], [2, 3], [1, 3], [1, 2, 3, 5], [1, 2, 3], ] itemsets = apriori(transactions, 2 / 9) fg = fpgrowth(transactions, 2 / 9) println("Apriori and FP-Growth agree: ", sort(collect(keys(itemsets))) == sort(collect(keys(fg)))) println(length(itemsets), " frequent itemsets") for r in association_rules(itemsets, 0.7) println(" ", Set(r.antecedent), " => ", Set(r.consequent), " | conf=", round(r.confidence; digits = 2), " lift=", round(r.lift; digits = 2)) end ``` ## Rust ```rust use aiinaction::ch150_association_rules::{apriori, fpgrowth, association_rules}; fn main() { let transactions = vec![ vec![1, 2, 5], vec![2, 4], vec![2, 3], vec![1, 2, 4], vec![1, 3], vec![2, 3], vec![1, 3], vec![1, 2, 3, 5], vec![1, 2, 3], ]; let itemsets = apriori(&transactions, 2.0 / 9.0).unwrap(); let fg = fpgrowth(&transactions, 2.0 / 9.0).unwrap(); let agree: Vec<_> = itemsets.keys().collect(); let agree_fg: Vec<_> = fg.keys().collect(); println!("Apriori and FP-Growth agree: {}", agree == agree_fg); println!("{} frequent itemsets", itemsets.len()); for r in association_rules(&itemsets, 0.7).unwrap() { println!( " {:?} => {:?} | conf={:.2} lift={:.2}", r.antecedent, r.consequent, r.confidence, r.lift ); } } ``` ::: ## 7. Market-Basket Analysis in Practice ### 7.1 The Workflow Market-basket analysis is the original and still dominant application. The pipeline is direct. Transform point of sale logs into a transaction list, choose support and confidence thresholds, mine frequent itemsets and rules, then rank and filter by lift and the auxiliary measures of Section 2.4. A minimal scikit style invocation looks like this. ```python from mlxtend.frequent_patterns import fpgrowth, association_rules # df is a one-hot transaction matrix: rows = baskets, cols = items itemsets = fpgrowth(df, min_support=0.01, use_colnames=True) rules = association_rules(itemsets, metric="lift", min_threshold=1.2) strong = rules[(rules.confidence > 0.5) & (rules.lift > 1.5)] ``` ### 7.2 Choosing Thresholds Threshold selection is the practitioner's central lever. Set $\text{minsup}$ too high and you miss niche but lucrative associations. Set it too low and the algorithm drowns in spurious patterns and may exhaust memory. A common heuristic is to pick the lowest support that keeps runtime and result counts tractable, then rely on lift and confidence to filter. Because item popularity is heavily skewed, a single global $\text{minsup}$ is often wrong for both head and tail items. The **multiple minimum supports** framework assigns item specific thresholds so that rare but meaningful items are not pruned away by a threshold tuned for common ones. ### 7.3 Interpretation and Action The output of mining is a hypothesis generator, not a conclusion. Three caveats matter. First, correlation is not causation. The famous beer and diapers association, whatever its provenance, suggests a placement experiment, not a causal claim. Second, rules can be **redundant**. If $\{A\} \Rightarrow \{B\}$ and $\{A, C\} \Rightarrow \{B\}$ have the same confidence, the more specific rule adds nothing, and post processing should suppress such redundancy, often by restricting attention to **closed** or **maximal** frequent itemsets. An itemset is closed when no superset has the same support, and maximal when no superset is frequent. These condensed representations dramatically shrink the result set without losing information about which itemsets are frequent. Third, statistical significance deserves scrutiny. With millions of candidate rules, some will clear any lift threshold by chance alone, a multiple testing problem. Validation on a holdout period, or formal significance testing, guards against acting on noise. ### 7.4 Beyond the Supermarket The transactional abstraction generalizes far past retail. Web usage mining treats page views in a session as a basket. Recommender systems use co-occurrence rules as a transparent, cold start friendly complement to collaborative filtering. In bioinformatics, items are genes or mutations and transactions are samples. Intrusion detection mines frequent patterns of network events. Healthcare analytics finds co-occurring diagnoses and prescriptions. In each case the same triad of support, confidence, and lift, computed by Apriori, FP-Growth, or their descendants, turns a pile of set valued records into ranked, human readable rules. ## 8. Summary Frequent itemset mining and association rules provide an interpretable, unsupervised lens on co-occurrence structure. Support quantifies prevalence, confidence quantifies conditional reliability, and lift corrects confidence for baseline frequency, with leverage and conviction filling remaining gaps. The downward closure property is the theoretical key that makes the exponential itemset lattice searchable. Apriori realizes it through level wise candidate generation and Boolean pruning, while FP-Growth sidesteps candidate generation with a compressed prefix tree mined by recursive pattern growth. Both feed market-basket analysis, where careful threshold selection, redundancy removal, and skepticism about causation separate actionable insight from statistical mirage. ## References 1. Agrawal, R., Imielinski, T., and Swami, A. (1993). Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD. https://dl.acm.org/doi/10.1145/170035.170072 2. Agrawal, R., and Srikant, R. (1994). Fast Algorithms for Mining Association Rules. VLDB. https://www.vldb.org/conf/1994/P487.PDF 3. Han, J., Pei, J., and Yin, Y. (2000). Mining Frequent Patterns without Candidate Generation. ACM SIGMOD. https://dl.acm.org/doi/10.1145/342009.335372 4. Zaki, M. J. (2000). Scalable Algorithms for Association Mining. IEEE TKDE. https://ieeexplore.ieee.org/document/846291 5. Tan, P. N., Steinbach, M., Karpatne, A., and Kumar, V. (2018). Introduction to Data Mining, 2nd ed. Pearson. https://www-users.cse.umn.edu/~kumar001/dmbook/index.php 6. Han, J., Kamber, M., and Pei, J. (2011). Data Mining: Concepts and Techniques, 3rd ed. Morgan Kaufmann. https://www.elsevier.com/books/data-mining-concepts-and-techniques/han/978-0-12-381479-1 7. Brin, S., Motwani, R., Ullman, J. D., and Tsur, S. (1997). Dynamic Itemset Counting and Implication Rules for Market Basket Data. ACM SIGMOD. https://dl.acm.org/doi/10.1145/253260.253325 8. Liu, B., Hsu, W., and Ma, Y. (1999). Mining Association Rules with Multiple Minimum Supports. ACM SIGKDD. https://dl.acm.org/doi/10.1145/312129.312274 9. mlxtend documentation. Frequent Patterns. https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/