31  Graph Neural Networks and Network Credit Risk

Scope: both retail and corporate. Graph fundamentals are general; the chapter splits into retail loan-application graphs (LendingClub) and corporate supply-chain and counterparty networks for SME and corporate exposures.

Overview

Credit risk is relational. A factory that loses its only buyer fails even if its books looked clean the day before. A small supplier whose bank collapses cannot roll working-capital lines, no matter what its leverage ratio said. A bank lending into a tightly connected industrial cluster holds a portfolio whose defaults are far from independent. Treating each borrower as an IID row in a table, which is the implicit assumption behind every tabular model covered earlier in this book (from the discriminant-analysis chapter Chapter 6 through the benchmarking chapter Chapter 16), throws away the structure that actually drives systemic and idiosyncratic credit losses.

This chapter develops tools that put the network first. We begin with credit as a graph problem (Chapter 31), formalize it with adjacency and Laplacian matrices, and then derive the three workhorse graph neural networks used in practice today (Section 31.3): the graph convolutional network (Kipf & Welling, 2017), the inductive GraphSAGE aggregator (Hamilton et al., 2017), and the graph attention network (Veličković et al., 2018). We connect these to default contagion models from the systemic-risk literature (Acemoglu et al., 2015; Eisenberg & Noe, 2001; Gai & Kapadia, 2010) (Section 31.6), show how supply-chain and counterparty exposures propagate losses, and implement node classification on a synthetic SME network using PyTorch Geometric. A logistic regression that ignores structure serves as the honest baseline. We close with explainability (GNNExplainer, PGExplainer) (Section 31.7), scalability for hundred-million-edge graphs (neighborhood sampling, Cluster-GCN, distributed training), and the regulatory posture a network model must take under SR 11-7 and the EU AI Act.

Emerging-market lenders have a second reason to take graph methods seriously. In markets with shallow bureau coverage, the relational data that fintech platforms collect about their own users (merchants, customers, wallet peers) is often the only scale-level signal available. The Vietnam and emerging markets section at the end of this chapter walks through how a merchant-customer graph from MoMo or VNPay maps onto a GNN scoring problem.

The promise is concrete. When the label signal lives in community structure or neighborhood propagation, message-passing models can recover it where a tabular model cannot. The caution is equally concrete. Graph data leak between train and test in subtle ways. Explanations that are faithful to the model are not the same as explanations that are faithful to the data-generating process. And the largest production networks force engineers into sampling regimes that change the model’s effective receptive field. A practitioner needs to see all three.

Notation

Graphs are \(\mathcal{G} = (\mathcal{V}, \mathcal{E})\) with \(|\mathcal{V}|=n\) nodes and \(|\mathcal{E}|=m\) edges. The adjacency matrix \(A \in \mathbb{R}^{n \times n}\) has \(A_{ij}>0\) if there is an edge from \(i\) to \(j\), else zero. The degree matrix \(D\) is diagonal with \(D_{ii}=\sum_j A_{ij}\). Node features are rows of \(X \in \mathbb{R}^{n \times d}\). A node label \(y_i \in \{0,1\}\) is the default indicator over the next 12 months. Hidden representations at layer \(l\) form a matrix \(H^{(l)} \in \mathbb{R}^{n \times h_l}\) with \(H^{(0)}=X\). Trainable weights at layer \(l\) are \(W^{(l)}\). \(\sigma(\cdot)\) is a non-linearity (ReLU unless stated). \(\mathcal{N}(i)\) is the set of neighbors of \(i\).

31.1 Credit as a graph problem

Start with three graphs that dominate modern credit risk.

The borrower-firm-bank tripartite network. Consider a country’s aggregate credit register. Three node types coexist: individual borrowers, non-financial firms, and banks. Edges run from banks to firms (loans outstanding), from banks to households (consumer loans and mortgages), and from firms to households (payroll, shareholding). An edge from bank \(b\) to firm \(f\) carries a weight equal to the exposure at default, possibly conditional on time. The same firm may also owe wages to several households. When a bank suffers a shock, it tightens credit to the firms in its book (Hale et al., 2020; Iyer & Peydro, 2011). When those firms cut production, the households that work for them lose income and default on consumer loans. The bank’s next-quarter loss is therefore a function of its own balance sheet, its firm-side portfolio, and the household-side portfolios of the firms it lends to, all of which are distinct but not independent. No flat feature vector captures this.

Supplier-buyer networks. Production is organized as a directed graph. An edge \((u, v)\) means supplier \(u\) delivers inputs to buyer \(v\). Weights can be dollar sales, share of buyer’s inputs, share of supplier’s revenue, contractual specificity, or information links between listed firms (Cohen & Frazzini, 2008). When a supplier fails, its downstream buyers scramble for substitutes; if inputs are specific, substitution is slow and expensive (Barrot & Sauvagnat, 2016). The 2011 Tohoku earthquake disrupted supply chains far beyond the affected region, with propagation distances of two or three intermediaries (Carvalho et al., 2021). The network origin of aggregate fluctuations is the same logic at macro scale (Acemoglu et al., 2012). For SME scoring, the supplier-buyer graph provides features no financial statement carries: how concentrated is the buyer base, how long the chain upstream, how vulnerable is the firm to a single-point failure.

Social networks. For consumer credit in thin-file populations, the friendship and payment graph is an information source. Mobile-money transaction graphs in East Africa predict repayment (Björkegren & Grissen, 2020). Online P2P platforms in the early 2010s showed that social links reduce information asymmetry: a borrower’s friends’ repayment history predicts the borrower’s own default, controlling for observables (Lin et al., 2013). Peer screening is effective even for small unsecured loans (Iyer et al., 2016). The theoretical backbone is social collateral (Karlan et al., 2009), whereby enforcement through relationships substitutes for formal contracts. For the lender, the practical question is how to embed each borrower’s position in the graph into a score.

A fourth, interbank network, appears throughout the systemic-risk literature (Allen & Gale, 2000; Cont et al., 2013; Freixas et al., 2000; Haldane & May, 2011). The nodes are banks and edges are interbank exposures. The regulator’s object of interest is contagion: a large bank’s failure cascades through claims, forcing fire sales and downstream defaults. We treat interbank networks as the closest analog to supplier-buyer networks for wholesale credit.

The common pattern is that the quantity we want to predict (default) depends on features of the node, features of the neighbors, and features of the neighbors’ neighbors. That is exactly what message passing computes.

31.2 Graph fundamentals

Fix notation that the rest of the chapter uses without comment.

31.2.1 Adjacency and its friends

For an undirected simple graph, \(A \in \{0,1\}^{n \times n}\) is symmetric with zero diagonal. For weighted graphs, \(A_{ij} \in \mathbb{R}_{\ge 0}\). A directed graph gives an asymmetric \(A\). Self-loops appear on the diagonal. Let \(\tilde{A} = A + I_n\) denote the adjacency with self-loops added. The degree matrix \(D\) is diagonal with \(D_{ii} = \sum_j A_{ij}\). The normalized adjacency and its symmetric cousin are \[ D^{-1} A, \qquad D^{-1/2} A D^{-1/2}, \tag{31.1}\] which play the roles of stochastic (random-walk) and symmetric normalization respectively.

The Laplacian matrices are central to spectral methods: \[ L = D - A, \qquad L_{\text{rw}} = I - D^{-1} A, \qquad L_{\text{sym}} = I - D^{-1/2} A D^{-1/2}. \tag{31.2}\] \(L\) is symmetric positive semi-definite; its eigenvalues \(0 = \lambda_1 \le \lambda_2 \le \cdots \le \lambda_n\) encode connectivity. The multiplicity of \(\lambda_1=0\) equals the number of connected components (Chung, 1997). The second smallest eigenvalue \(\lambda_2\), the algebraic connectivity, measures how well a single cluster sticks together. For \(L_{\text{sym}}\), eigenvalues lie in \([0, 2]\).

31.2.2 Centrality

Every practitioner encounters several node-level summary statistics. They are useful as features and as sanity checks.

  • Degree \(d_i = \sum_j A_{ij}\): local connectivity. For supplier graphs, in-degree is the number of suppliers, out-degree the number of buyers.
  • Eigenvector centrality \(v_i\) where \(A v = \lambda_{\max} v\): a node is central if its neighbors are central. Katz centrality (Katz, 1953) is a regularized variant, \((I - \alpha A)^{-1} \mathbf{1}\), ensuring non-degenerate solutions.
  • PageRank (Page et al., 1999): the stationary distribution of a random walk with restart, \(\pi = \alpha P^\top \pi + (1-\alpha) \mathbf{1}/n\), where \(P = D^{-1} A\). PageRank underlies DebtRank, a systemic-importance measure (Battiston et al., 2012).
  • Betweenness (Freeman, 1977): fraction of all-pairs shortest paths passing through node \(i\). Expensive at scale.
  • Clustering coefficient \(C_i\): fraction of pairs of \(i\)’s neighbors that are themselves connected. Financial networks are typically high-clustering, low-diameter small worlds (Haldane & May, 2011).

31.2.3 Spectral filtering

Graph signal processing works in the eigenbasis of \(L\). Decompose \(L = U \Lambda U^\top\). For a node signal \(x \in \mathbb{R}^n\), the graph Fourier transform is \(\hat{x} = U^\top x\). A graph convolution is multiplication in the frequency domain by a filter \(g_\theta(\Lambda)\): \[ g_\theta \star x = U g_\theta(\Lambda) U^\top x. \tag{31.3}\] Full eigendecomposition is \(O(n^3)\), impossible at scale. Approximations by polynomials of \(L\) of degree \(K\) produce localized filters over \(K\)-hop neighborhoods. The ChebNet construction uses Chebyshev polynomials (Defferrard et al., 2016). Kipf and Welling’s GCN is a particular simplification: \(K=1\) and a clever normalization choice (Kipf & Welling, 2017). We derive it from scratch next.

31.3 GCN, GraphSAGE, and GAT

31.3.1 Message passing as the common frame

Gilmer et al. introduced the neural message passing framework that unifies essentially every modern GNN (Gilmer et al., 2017). A message passing layer updates each node’s representation by aggregating messages from its neighbors: \[ m_i^{(l)} = \operatorname{AGGREGATE}\left( \{ \phi^{(l)}( h_j^{(l)}, h_i^{(l)}, e_{ji} ) : j \in \mathcal{N}(i) \} \right), \tag{31.4}\] \[ h_i^{(l+1)} = \operatorname{UPDATE}\left( h_i^{(l)}, m_i^{(l)} \right). \tag{31.5}\] \(\phi\) is a learnable message function, AGGREGATE is permutation invariant (sum, mean, max, attention), and UPDATE combines the node’s previous state with the aggregated message. Different choices of the three ingredients reproduce GCN, GraphSAGE, GAT, GIN (Xu et al., 2019), and every other major variant (Wu et al., 2021).

31.3.2 Derivation of the GCN propagation rule

Kipf and Welling start from a first-order approximation of spectral graph convolutions (Kipf & Welling, 2017). Begin with the Chebyshev filter of degree \(K\): \[ g_\theta \star x \approx \sum_{k=0}^{K} \theta_k T_k(\tilde{L}) x, \qquad \tilde{L} = \frac{2}{\lambda_{\max}} L_{\text{sym}} - I, \tag{31.6}\] where \(T_k\) is the degree-\(k\) Chebyshev polynomial. Set \(K=1\) and approximate \(\lambda_{\max} \approx 2\) (valid for \(L_{\text{sym}}\)). Then \[ g_\theta \star x \approx \theta_0 x + \theta_1 (L_{\text{sym}} - I) x = \theta_0 x - \theta_1 D^{-1/2} A D^{-1/2} x. \tag{31.7}\] Force a single free parameter \(\theta = \theta_0 = -\theta_1\) to reduce overparameterization: \[ g_\theta \star x \approx \theta \left( I + D^{-1/2} A D^{-1/2} \right) x. \tag{31.8}\] The operator \(I + D^{-1/2} A D^{-1/2}\) has eigenvalues in \([0, 2]\), which can destabilize deep networks via repeated multiplication. Add self-loops: let \(\tilde{A} = A + I\) and \(\tilde{D}_{ii} = \sum_j \tilde{A}_{ij}\). Renormalize. This is the famous renormalization trick: \[ I + D^{-1/2} A D^{-1/2} \longrightarrow \hat{A} := \tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2}. \tag{31.9}\] Generalize from a scalar signal to a matrix \(X \in \mathbb{R}^{n \times d}\) and stack several filters per layer via a weight matrix \(W^{(l)} \in \mathbb{R}^{h_l \times h_{l+1}}\). The GCN layer is \[ H^{(l+1)} = \sigma \left( \hat{A} H^{(l)} W^{(l)} \right) = \sigma\left( \tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} H^{(l)} W^{(l)} \right). \tag{31.10}\] Equation Eq. 31.10 is the GCN propagation rule. Four properties matter for practice.

  1. Each layer aggregates strictly 1-hop information. A GCN with \(L\) layers has an \(L\)-hop receptive field. Two layers are the standard baseline and often the best, because deeper GCNs over-smooth node representations into a constant.
  2. \(\hat{A}\) is fixed; it is not learned. Only \(W^{(l)}\) is trained. The inductive bias is strong: nodes are encouraged to look like a weighted average of themselves and their neighbors.
  3. The normalization is symmetric. Each message from \(j\) to \(i\) is scaled by \(1/\sqrt{\tilde{d}_i \tilde{d}_j}\). High-degree neighbors are downweighted.
  4. The transformation \(W^{(l)}\) is shared across nodes. GCN is transductive: all nodes, both labeled and unlabeled, must appear in \(\hat{A}\) at training time.

The last property is a problem for production credit scoring: portfolios churn, and new borrowers arrive daily. That is what GraphSAGE fixes.

31.3.3 GraphSAGE: inductive representation learning

GraphSAGE drops the global matrix \(\hat{A}\) and replaces it with a per-node neighborhood sampler (Hamilton et al., 2017). For each node \(i\) and each layer \(l\), sample a fixed number \(K_l\) of neighbors \(\mathcal{N}_s(i) \subset \mathcal{N}(i)\). Compute \[ h_{\mathcal{N}(i)}^{(l+1)} = \operatorname{AGG}_l\left( \{ h_j^{(l)} : j \in \mathcal{N}_s(i) \} \right), \tag{31.11}\] \[ h_i^{(l+1)} = \sigma\left( W^{(l)} \cdot \operatorname{CONCAT}\left( h_i^{(l)}, h_{\mathcal{N}(i)}^{(l+1)} \right) \right), \tag{31.12}\] followed by \(l_2\) normalization \(h_i^{(l+1)} \leftarrow h_i^{(l+1)} / \lVert h_i^{(l+1)} \rVert_2\). Three aggregators are standard:

  • Mean: \(\operatorname{MEAN}(\{h_j\}) = \frac{1}{|\mathcal{N}_s(i)|} \sum_j h_j\). Cheap, order-invariant, close in spirit to GCN.
  • LSTM: pass the neighbors in a random order through an LSTM, take the final hidden state. Not permutation-invariant by construction; randomized ordering is a workaround. Expressive but slow.
  • Pool: transform each neighbor by a shared MLP, then elementwise max, \(\operatorname{POOL}(\{h_j\}) = \max\left( \{ \sigma(W_{\text{pool}} h_j + b) : j \}\right)\). Good accuracy, fast.

Because neighbors are sampled, an unseen node can be scored at inference by sampling its own neighborhood and running the layers forward. That is what “inductive” means. It is also what makes GraphSAGE the default choice for large, churning graphs: drop in new borrowers without retraining.

31.3.4 GAT: attention on edges

GCN weights each neighbor’s message by the fixed scalar \(1 / \sqrt{\tilde{d}_i \tilde{d}_j}\). GraphSAGE averages or maxes within a sample. GAT learns the weight per edge (Veličković et al., 2018). For each pair \((i, j)\) with \(j \in \mathcal{N}(i) \cup \{i\}\), compute an unnormalized attention score \[ e_{ij} = \operatorname{LeakyReLU}\left( \mathbf{a}^\top \left[ W h_i \Vert W h_j \right] \right), \tag{31.13}\] where \(\mathbf{a} \in \mathbb{R}^{2 h'}\) is a learnable vector, \(W \in \mathbb{R}^{h' \times h}\) is the shared transform, and \(\Vert\) is concatenation. Normalize by softmax over \(i\)’s neighborhood: \[ \alpha_{ij} = \operatorname{softmax}_j\left( e_{ij} \right) = \frac{\exp(e_{ij})}{\sum_{k \in \mathcal{N}(i) \cup \{i\}} \exp(e_{ik})}. \tag{31.14}\] The updated representation is \[ h_i^{(l+1)} = \sigma\left( \sum_{j \in \mathcal{N}(i) \cup \{i\}} \alpha_{ij} W h_j^{(l)} \right). \tag{31.15}\] Multi-head attention runs \(K\) independent copies and concatenates (or averages, at the final layer): \[ h_i^{(l+1)} = \operatorname{CONCAT}_{k=1}^K \sigma\left( \sum_{j \in \mathcal{N}(i) \cup \{i\}} \alpha_{ij}^{(k)} W^{(k)} h_j^{(l)} \right). \tag{31.16}\] Attention adapts weights to the task. In a supply-chain graph, \(\alpha_{ij}\) learns that certain buyer-supplier relationships are more informative than others, for example concentrated sole-supplier arrangements. The price is a squared-degree cost for dense neighborhoods and reduced interpretability: the learned \(\alpha\)’s depend on the loss and are not a model of dependence in the data.

31.3.5 Which to use?

A practical rubric drawn from benchmarks and deployment experience.

  • Small transductive problem (the whole graph fits, labels sparse): GCN. The first thing to run.
  • Large, churning graph, new borrowers arrive daily: GraphSAGE with mean or pool aggregator.
  • Heterogeneous edges, concentrated structures, attention-worthy (syndicated loans, guarantee networks, concentrated counterparties): GAT.
  • Maximum discriminative power on structure (motifs), need to distinguish isomorphic graphs: GIN (Xu et al., 2019). Useful but overkill for most credit problems.

31.4 Supply chain and counterparty risk

Contagion on a graph is a dynamical process. Two stylized models cover the intuition.

31.4.1 Branching-process contagion

A defaulted firm triggers a contagious default at each counterparty with independent probability \(\beta\) per round. Starting from seed set \(\mathcal{S}_0\), round \(t\) produces \[ \Pr(i \text{ defaults at round } t \mid \text{history}) = 1 - (1 - \beta)^{k_{i,t-1}}, \tag{31.17}\] where \(k_{i,t-1}\) is the number of \(i\)’s neighbors that have already defaulted by round \(t-1\). This is a discrete-time SIR-style process without recovery. Total losses are the exposure-weighted sum of the infected set. The percolation threshold is \(\beta_c \approx 1/\langle k \rangle\) for locally tree-like graphs; above \(\beta_c\) a macroscopic cascade is possible (Newman, 2003). Real supplier networks have heavy-tailed degree distributions, which lowers \(\beta_c\) and fattens the loss tail.

31.4.2 Balance-sheet clearing

Eisenberg and Noe (Eisenberg & Noe, 2001) modeled interbank contagion as a fixed point. Let \(L_{ij}\) be the liability of bank \(i\) to bank \(j\), \(\bar{L}_i = \sum_j L_{ij}\) the total liability, \(\pi_{ij} = L_{ij}/\bar{L}_i\) the relative liability, \(e_i\) bank \(i\)’s external assets. Clearing payments \(p^* \in [0, \bar{L}]\) solve \[ p_i^* = \min\left( \bar{L}_i, e_i + \sum_j \pi_{ji} p_j^* \right). \tag{31.18}\] A unique clearing vector exists under mild conditions. A shock to \(e\) recursively reduces \(p^*\), matching the intuition that one bank’s payment failure starves others. Variants add bankruptcy costs, fire sales, and liquidity spirals (Bardoscia et al., 2021; Cont et al., 2013; Glasserman & Young, 2016). Gai and Kapadia gave a celebrated simulation framework where contagion is driven by a funding-liquidity channel and percolation thresholds mirror those of random graphs (Gai & Kapadia, 2010). Acemoglu et al. showed that dense, homogeneous networks absorb small shocks but transmit large shocks; sparser, more concentrated networks do the opposite (Acemoglu et al., 2015; Elliott et al., 2014).

31.4.3 Default clustering, not just contagion

Empirically, US corporate defaults cluster beyond what observable covariates predict (Das et al., 2007). Part is contagion, part is a common frailty (Duffie et al., 2009). Distinguishing the two matters for capital: contagion implies structural interventions (firewalls, CCP mandates); frailty implies scenario-robust provisioning (Azizpour et al., 2018; Lando & Nielsen, 2010). GNNs can help, but they need a causal story. An encoder trained on contemporaneous features and outcomes will absorb both channels into weights; disentangling them requires instrumenting the graph structure or using natural experiments (Barrot & Sauvagnat, 2016; Carvalho et al., 2021).

31.5 SME network-based scoring

SME lending is where graphs add the most. A financial statement for a 10-employee firm is sparse and noisy; the firm’s position in a supply chain, its payment network with suppliers and buyers, its exposure to anchor customers, and the credit status of its main counterparties are all highly informative. Letizia and Lillo (Letizia & Lillo, 2019) showed that bank-payment network features improve credit rating predictions on Italian SME data. Cheng et al. (Cheng et al., 2019) applied high-order attention to guarantee networks, which are common in China where loan guarantees cross-secure firms into clusters that can cascade. The broader economic logic ties back to production networks (Acemoglu et al., 2012; Barrot & Sauvagnat, 2016; Carvalho et al., 2021).

Now we build an end-to-end example. We will:

  1. Simulate an SME supply-chain graph with two communities (risky and safe) and supplier-buyer edges.
  2. Attach noisy financial features to each firm.
  3. Label defaults by a latent that depends on the firm’s community and its features.
  4. Train a logistic-regression baseline on flat features.
  5. Train GCN, GraphSAGE, and GAT on the graph.
  6. Compare held-out AUC.
  7. Simulate default contagion and plot portfolio loss distributions.
  8. Run GNNExplainer on the riskiest test firm.

All code is deterministic and runs end-to-end in well under 90 seconds on a laptop.

31.5.1 Setup

Show code
from __future__ import annotations
import sys, os, random, warnings
warnings.filterwarnings("ignore")
sys.path.insert(0, "../code")

import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import torch
import torch.nn.functional as F
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, average_precision_score
from torch_geometric.nn import GCNConv, SAGEConv, GATConv
from torch_geometric.utils import from_networkx

from creditutils import ks_statistic

SEED = 0
np.random.seed(SEED)
random.seed(SEED)
torch.manual_seed(SEED)
<torch._C.Generator at 0x116ba9850>

31.5.2 Building a synthetic SME supply-chain graph

The generator combines two ingredients. First, a stochastic block model places firms into two communities: a risky cluster (industries exposed to a common shock) and a safe cluster. Within-community edge probability is higher than across, so firms mostly trade with peers in their own industry but occasionally sell across. Second, we add weights representing trade volume as fraction of the supplier’s revenue. Node-level features include leverage, return on assets (ROA), size, age, and a one-hot industry code. Defaults are driven by community membership (mimicking an industry shock), leverage, and ROA, with Gaussian noise. Features are then corrupted to represent reporting lag.

Show code
N = 300                             # number of firms
sizes = [150, 150]                  # two communities
p_in = 0.06                         # within-community edge prob
p_out = 0.01                        # across-community edge prob
rng = np.random.default_rng(SEED)

G = nx.stochastic_block_model(sizes, [[p_in, p_out], [p_out, p_in]], seed=1)
G = nx.DiGraph(G)                   # interpret as supplier -> buyer

for u, v in list(G.edges()):
    G[u][v]["weight"] = float(rng.uniform(0.1, 1.0))

block = np.array([0] * sizes[0] + [1] * sizes[1])
leverage = rng.uniform(0.1, 0.9, size=N)
roa = rng.normal(0.04, 0.10, size=N)
size = rng.normal(0.0, 1.0, size=N)
age = rng.uniform(1.0, 30.0, size=N)
industry = rng.integers(0, 3, size=N)

latent = (
    0.5 * leverage
    - 1.0 * roa
    + 2.0 * block                 # industry shock via community
    + 0.3 * rng.normal(size=N)
)
y = (latent > np.quantile(latent, 0.70)).astype(int)

print(f"nodes: {G.number_of_nodes()}, edges: {G.number_of_edges()}")
print(f"default rate: {y.mean():.3f}")
print(f"default rate in risky block: {y[block == 1].mean():.3f}")
print(f"default rate in safe block:  {y[block == 0].mean():.3f}")
nodes: 300, edges: 3208
default rate: 0.300
default rate in risky block: 0.600
default rate in safe block:  0.000

The risky block concentrates defaults but features are noisy enough that a firm’s community is not obvious from its own numbers: that is precisely the regime where neighborhood structure beats a flat classifier.

Show code
obs_lev = leverage + 0.4 * rng.normal(size=N)
obs_roa = roa + 0.10 * rng.normal(size=N)
ind_oh = np.eye(3)[industry]
X = np.column_stack([obs_lev, obs_roa, size, age / 30.0, ind_oh])
X = (X - X.mean(axis=0)) / (X.std(axis=0) + 1e-9)
X = X.astype(np.float32)
print("feature matrix:", X.shape)
feature matrix: (300, 7)

Attach features to the NetworkX graph and convert to PyG’s Data container.

Show code
for i in range(N):
    G.nodes[i]["x"] = X[i].tolist()

data = from_networkx(G, group_node_attrs=["x"])
data.x = data.x.float()
data.y = torch.tensor(y, dtype=torch.long)

idx = np.arange(N)
rng.shuffle(idx)
n_tr, n_va = int(0.60 * N), int(0.20 * N)
tr_idx, va_idx, te_idx = idx[:n_tr], idx[n_tr:n_tr + n_va], idx[n_tr + n_va:]


def mk_mask(ids):
    m = torch.zeros(N, dtype=torch.bool)
    m[ids] = True
    return m


data.train_mask = mk_mask(tr_idx)
data.val_mask = mk_mask(va_idx)
data.test_mask = mk_mask(te_idx)
print(data)
Data(edge_index=[2, 3208], block=[300], weight=[3208], partition=[2], name='stochastic_block_model', x=[300, 7], y=[300], train_mask=[300], val_mask=[300], test_mask=[300])

31.5.3 Tabular baseline: logistic regression

The honest baseline trains on node features only. If a GNN does not beat this, the graph is not adding information. If a GNN wins by a lot, the graph is where the signal lives.

Show code
lr = LogisticRegression(max_iter=500, random_state=SEED).fit(X[tr_idx], y[tr_idx])
p_lr = lr.predict_proba(X[te_idx])[:, 1]
auc_lr = roc_auc_score(y[te_idx], p_lr)
ks_lr = ks_statistic(y[te_idx], p_lr)
print(f"LR test AUC: {auc_lr:.3f}")
print(f"LR test KS : {ks_lr:.3f}")
LR test AUC: 0.538
LR test KS : 0.215

31.5.4 GCN

Two layers of Eq. 31.10. Adam with weight decay. Early-model-selection by validation AUC.

Show code
class GCN(torch.nn.Module):
    def __init__(self, in_dim, hidden, n_classes, p_drop=0.3):
        super().__init__()
        self.conv1 = GCNConv(in_dim, hidden)
        self.conv2 = GCNConv(hidden, n_classes)
        self.p = p_drop

    def forward(self, x, edge_index):
        h = F.relu(self.conv1(x, edge_index))
        h = F.dropout(h, p=self.p, training=self.training)
        return self.conv2(h, edge_index)


def train_gnn(model, data, n_epochs=150, lr=1e-2, wd=5e-4, verbose=False):
    opt = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=wd)
    best_auc, best_state = 0.0, None
    y_val = data.y[data.val_mask].numpy()
    for ep in range(n_epochs):
        model.train()
        opt.zero_grad()
        out = model(data.x, data.edge_index)
        loss = F.cross_entropy(out[data.train_mask], data.y[data.train_mask])
        loss.backward()
        opt.step()
        if ep % 10 == 0:
            model.eval()
            with torch.no_grad():
                p = F.softmax(model(data.x, data.edge_index), dim=-1)[:, 1]
                auc = roc_auc_score(y_val, p[data.val_mask].numpy())
                if auc > best_auc:
                    best_auc = auc
                    best_state = {k: v.clone() for k, v in model.state_dict().items()}
                if verbose:
                    print(f"  ep {ep:3d}  loss {loss.item():.3f}  val AUC {auc:.3f}")
    if best_state is not None:
        model.load_state_dict(best_state)
    return model, best_auc


torch.manual_seed(SEED)
gcn = GCN(X.shape[1], hidden=32, n_classes=2)
gcn, gcn_va = train_gnn(gcn, data)
gcn.eval()
with torch.no_grad():
    p_gcn = F.softmax(gcn(data.x, data.edge_index), dim=-1)[:, 1]
auc_gcn = roc_auc_score(y[te_idx], p_gcn[te_idx].numpy())
print(f"GCN val AUC: {gcn_va:.3f}")
print(f"GCN test AUC: {auc_gcn:.3f}")
GCN val AUC: 0.829
GCN test AUC: 0.779

GCN’s lift over logistic regression on this synthetic graph quantifies the value of 2-hop smoothing when the label signal is community-driven and features are noisy.

31.5.5 GraphSAGE

Show code
class GraphSAGE(torch.nn.Module):
    def __init__(self, in_dim, hidden, n_classes, agg="mean", p_drop=0.3):
        super().__init__()
        self.conv1 = SAGEConv(in_dim, hidden, aggr=agg)
        self.conv2 = SAGEConv(hidden, n_classes, aggr=agg)
        self.p = p_drop

    def forward(self, x, edge_index):
        h = F.relu(self.conv1(x, edge_index))
        h = F.dropout(h, p=self.p, training=self.training)
        return self.conv2(h, edge_index)


torch.manual_seed(SEED)
sage = GraphSAGE(X.shape[1], hidden=32, n_classes=2, agg="mean")
sage, sage_va = train_gnn(sage, data)
sage.eval()
with torch.no_grad():
    p_sage = F.softmax(sage(data.x, data.edge_index), dim=-1)[:, 1]
auc_sage = roc_auc_score(y[te_idx], p_sage[te_idx].numpy())
print(f"GraphSAGE test AUC: {auc_sage:.3f}")
GraphSAGE test AUC: 0.739

31.5.6 GAT

Show code
class GAT(torch.nn.Module):
    def __init__(self, in_dim, hidden, n_classes, heads=4, p_drop=0.2):
        super().__init__()
        self.conv1 = GATConv(in_dim, hidden, heads=heads, dropout=p_drop)
        self.conv2 = GATConv(hidden * heads, n_classes, heads=1,
                             concat=False, dropout=p_drop)

    def forward(self, x, edge_index):
        h = F.elu(self.conv1(x, edge_index))
        return self.conv2(h, edge_index)


torch.manual_seed(SEED)
gat = GAT(X.shape[1], hidden=16, n_classes=2, heads=4)
gat, gat_va = train_gnn(gat, data)
gat.eval()
with torch.no_grad():
    p_gat = F.softmax(gat(data.x, data.edge_index), dim=-1)[:, 1]
auc_gat = roc_auc_score(y[te_idx], p_gat[te_idx].numpy())
print(f"GAT test AUC: {auc_gat:.3f}")
GAT test AUC: 0.798

31.5.7 Comparison

Show code
summary = pd.DataFrame(
    {
        "model": ["LR (tabular)", "GCN", "GraphSAGE", "GAT"],
        "test AUC": [auc_lr, auc_gcn, auc_sage, auc_gat],
    }
)
summary["test AUC"] = summary["test AUC"].round(3)
summary
model test AUC
0 LR (tabular) 0.538
1 GCN 0.779
2 GraphSAGE 0.739
3 GAT 0.798

The ordering (LR at the bottom, message-passing models well above) is the signature of a graph-dominant data-generating process. When the label is driven by a community-level industry shock and the features are noisy proxies, 2-hop smoothing over the supply chain injects the missing signal. Readers who replace the synthetic generator with a label dominated by leverage and ROA will find the ordering flip: LR wins, and GNNs add little. Keep this in mind whenever a colleague pitches GNNs for a problem that is really tabular.

31.5.8 Visualizing the graph and predictions

Show code
fig, ax = plt.subplots(figsize=(6, 5))
pos = nx.spring_layout(G.to_undirected(), seed=1, k=0.5)
cols = ["#377eb8" if yi == 0 else "#e41a1c" for yi in y]
nx.draw_networkx_edges(G, pos, alpha=0.12, width=0.4, arrows=False, ax=ax)
nx.draw_networkx_nodes(G, pos, node_color=cols, node_size=18, ax=ax)
ax.set_axis_off()
plt.tight_layout()
plt.show()
Figure 31.1: SME supply-chain graph. Nodes colored by realized default. Position via spring layout.

As shown in Figure 31.1, the two communities are visible as clusters, and defaults concentrate in one of them.

31.6 Default propagation and portfolio loss

Move from prediction to simulation. Seed an initial set of defaults based on the highest-leverage firms and propagate losses through the supply chain following the branching model in Eq. 31.17. The exposure of a supplier to its buyers is proxied by edge weight. Loss under a cascade is the exposure-weighted count of defaulted counterparties.

Show code
def simulate_contagion(DG, seeds, beta, pd_base=0.02, rounds=4,
                       n_sim=500, rng=None):
    if rng is None:
        rng = np.random.default_rng(0)
    n = DG.number_of_nodes()
    # supplier-side exposure: sum of outgoing weights = revenue concentration
    exposure = np.array(
        [sum(d.get("weight", 1.0) for _, _, d in DG.out_edges(u, data=True))
         for u in range(n)]
    )
    losses = np.zeros(n_sim)
    for s in range(n_sim):
        d = np.zeros(n, dtype=bool)
        d[seeds] = True
        d |= rng.random(n) < pd_base
        for _ in range(rounds):
            newly = np.zeros(n, dtype=bool)
            for i in range(n):
                if d[i]:
                    continue
                preds = list(DG.predecessors(i))   # my suppliers
                k = sum(d[j] for j in preds)
                p = 1.0 - (1.0 - beta) ** k
                if rng.random() < p:
                    newly[i] = True
            if not newly.any():
                break
            d |= newly
        losses[s] = exposure[d].sum()
    return losses


seeds = list(np.argsort(-leverage)[:5])
losses_low = simulate_contagion(G, seeds, beta=0.03, n_sim=200,
                                rng=np.random.default_rng(1))
losses_mid = simulate_contagion(G, seeds, beta=0.08, n_sim=200,
                                rng=np.random.default_rng(1))
losses_hi = simulate_contagion(G, seeds, beta=0.15, n_sim=200,
                               rng=np.random.default_rng(1))

for name, ls in [("low", losses_low), ("mid", losses_mid), ("hi", losses_hi)]:
    print(
        f"beta {name}: mean {ls.mean():.2f} | "
        f"90% {np.quantile(ls, 0.90):.2f} | "
        f"99% {np.quantile(ls, 0.99):.2f}"
    )
beta low: mean 168.83 | 90% 242.64 | 99% 303.61
beta mid: mean 536.84 | 90% 706.54 | 99% 861.90
beta hi: mean 1171.45 | 90% 1388.07 | 99% 1511.83
Show code
fig, ax = plt.subplots(figsize=(6, 4))
bins = np.linspace(0, max(losses_hi.max(), 1), 40)
ax.hist(losses_low, bins=bins, alpha=0.55, label=r"$\beta = 0.03$")
ax.hist(losses_mid, bins=bins, alpha=0.55, label=r"$\beta = 0.08$")
ax.hist(losses_hi, bins=bins, alpha=0.55, label=r"$\beta = 0.15$")
ax.set_xlabel("portfolio loss (exposure-weighted)")
ax.set_ylabel("count")
ax.legend()
plt.tight_layout()
plt.show()
Figure 31.2: Portfolio loss distributions under three contagion coefficients. Higher beta fattens the right tail.

As shown in Figure 31.2, the jump in the 99th percentile between beta=0.03 and beta=0.15 is more than a fivefold increase. That tail is where economic capital lives. Two practical takeaways:

  • A modest change in per-edge transmission probability reshapes the loss tail non-linearly. Stress tests that assume additive shocks badly misestimate systemic risk.
  • Seeding the simulation from the highest-leverage nodes (as opposed to random firms) produces much larger cascades. The identity of the initial shock matters. DebtRank-like systemic-importance weights (Battiston et al., 2012) and their interbank analogs (Cont et al., 2013; Upper, 2011) formalize this.

31.7 Graph SHAP and GNN explainability

Explainability is harder on graphs than on tabular data. A prediction depends on node features, on the subgraph of neighbors reached within the receptive field, on edge weights, and on the attention coefficients for GAT. Two methods dominate practice today.

31.7.1 GNNExplainer

GNNExplainer (Ying et al., 2019) seeks the subgraph and feature subset that best preserve the model’s prediction for a target node. Formally, for node \(i\) with prediction \(\hat{y}_i\), find a mask over edges \(M \in [0,1]^{|\mathcal{E}|}\) and a mask over features \(F \in [0,1]^{d}\) that solve \[ \max_{M, F}\ \operatorname{MI}\left( Y_i,\ (\mathcal{G}_s, X_s) \right) = H(Y_i) - H\left( Y_i \mid \mathcal{G}_s, X_s \right), \tag{31.19}\] where \(\mathcal{G}_s\) is the subgraph induced by \(M\) and \(X_s\) the features masked by \(F\). In practice the objective is relaxed to a cross-entropy against the model’s prediction plus \(L_1\) and entropy penalties on the masks. The explanation for node \(i\) is the small subgraph of edges with high \(M\) values and the features with high \(F\) values.

31.7.2 PGExplainer

PGExplainer (Luo et al., 2020) parameterizes a global explanation network that produces edge masks. Instead of optimizing a new mask for each instance, train one MLP to map edge endpoint embeddings to mask logits; the explanation at test time is a forward pass. This is faster, transfers across nodes, and gives smoother explanations, at the cost of lower per-instance fidelity on outlier cases.

31.7.3 Running GNNExplainer

Show code
from torch_geometric.explain import Explainer, GNNExplainer

gcn.eval()
explainer = Explainer(
    model=gcn,
    algorithm=GNNExplainer(epochs=100),
    explanation_type="model",
    node_mask_type="attributes",
    edge_mask_type="object",
    model_config=dict(
        mode="multiclass_classification",
        task_level="node",
        return_type="raw",
    ),
)

risky_rank = np.argsort(-p_gcn[te_idx].numpy())
risky_node = int(te_idx[risky_rank[0]])
print(f"risky node index: {risky_node} | pred p_default: {p_gcn[risky_node].item():.3f} | y: {int(y[risky_node])}")

expl = explainer(data.x, data.edge_index, index=risky_node)
feat_names = [
    "leverage", "roa", "size", "age",
    "ind_0", "ind_1", "ind_2",
]
feat_imp = expl.node_mask.abs().sum(0).numpy()
edge_imp = expl.edge_mask.abs().numpy()

imp = pd.Series(feat_imp, index=feat_names).sort_values(ascending=False)
print("Feature importance (GNNExplainer):")
print(imp.round(3))
print(f"Top-10 edge mask values: {np.sort(edge_imp)[-10:].round(3)}")
risky node index: 202 | pred p_default: 0.904 | y: 1
Feature importance (GNNExplainer):
roa         56.952000
size        56.016998
leverage    54.537998
ind_0       53.713001
ind_2       51.812000
age         46.575001
ind_1       37.823002
dtype: float32
Top-10 edge mask values: [0.73  0.731 0.734 0.751 0.752 0.753 0.757 0.76  0.784 0.786]

The explanation tells you, for this specific risky firm, which financial features the model leans on and which edges (which suppliers and buyers) were most influential. In a real deployment, a credit officer uses this to sanity-check the model’s reasoning against domain knowledge: does the model point at a single anchor buyer whose own credit is deteriorating? If yes, that is a coherent story. If the model points at a random clique of unrelated firms, the explanation flags possible spurious correlation.

31.7.4 Caveats that trip up first-time users

  • GNN explanations are model-local, not data-local. They tell you what the model relied on, not what the causal drivers are in the world. For a causal story, pair GNNExplainer with do-calculus or counterfactual analysis.
  • Explanations are not unique. Slightly different masks can yield similar predictions; stability under perturbation is not automatic.
  • Under oversmoothing (too many layers), explanations become diffuse: every neighbor matters equally, which means no neighbor matters much. Keep \(L \le 3\) for GCN-style models unless there is a specific reason.

31.8 Scalability

Real credit graphs are large. A single mid-sized bank may have tens of millions of retail customers and millions of SME counterparties; cross-institutional networks at the regulator level can reach hundreds of millions of nodes. Vanilla GCN requires \(\hat{A}\) in memory and a full-graph forward pass; that breaks beyond a few hundred thousand nodes on a single GPU. Three scaling strategies dominate practice.

31.8.1 Neighborhood sampling (GraphSAGE-style)

Train in mini-batches. For each target node, sample a fixed number of 1-hop neighbors, then a fixed number of 2-hop neighbors, and so on (Hamilton et al., 2017). Layer \(l\) sees a tree of depth \(L\) rooted at the target. Memory is bounded by \(B \prod_l K_l\) where \(B\) is batch size and \(K_l\) the number of samples at layer \(l\). Accuracy is roughly preserved if \(K_l\) is 10 to 25 for 2-layer models. Bias from sampling can be corrected with importance-weighted sampling but is usually negligible when the graph is not too sparse.

31.8.2 Cluster-GCN

Chiang et al. (Chiang et al., 2019) partition the graph into clusters via METIS, then train a mini-batch that is the subgraph induced by a small set of clusters. This keeps dense intra-cluster edges intact, which preserves local structure; cross-cluster edges are dropped per batch but averaged across batches through shuffling. Memory and computation scale linearly in the batch. On the Reddit graph (200k nodes), Cluster-GCN achieves accuracy comparable to full-batch training with orders-of-magnitude less memory.

31.8.3 GraphSAINT

GraphSAINT (Zeng et al., 2020) samples subgraphs by node, by edge, or by random walks, and corrects the bias by importance weights in the loss. This avoids fixed layer-wise sampling bias and works well on deep GNNs.

31.8.4 Distributed training

For graphs that outgrow a single machine, frameworks like DGL-KE, Euler, and Aligraph shard the adjacency structure across machines. The standard pattern is to colocate nodes that are frequently co-sampled (via METIS or balanced partitioning), then use RPC-based neighbor fetching. Commercial banks with hundreds of millions of transactions typically run this stack on GPU clusters of 8 to 64 machines.

31.8.5 Empirical comparison on our small graph

On \(n=300\) our synthetic network fits in memory full-batch. We still exercise the neighborhood sampler to confirm nothing breaks.

Show code
# Full-graph forward pass with mini-batch loss over training nodes.
# NeighborLoader requires pyg-lib/torch-sparse which do not build on Apple Silicon;
# the graph is small (n=300), so we sample node indices directly.
torch.manual_seed(SEED)
sage_b = GraphSAGE(X.shape[1], hidden=32, n_classes=2, agg="mean")
opt_b = torch.optim.Adam(sage_b.parameters(), lr=1e-2, weight_decay=5e-4)
train_nodes = torch.where(data.train_mask)[0]
for ep in range(80):
    sage_b.train()
    perm = train_nodes[torch.randperm(len(train_nodes))]
    for i in range(0, len(perm), 64):
        batch_idx = perm[i:i + 64]
        opt_b.zero_grad()
        out = sage_b(data.x, data.edge_index)
        loss = F.cross_entropy(out[batch_idx], data.y[batch_idx])
        loss.backward()
        opt_b.step()

sage_b.eval()
with torch.no_grad():
    p_sage_b = F.softmax(sage_b(data.x, data.edge_index), dim=-1)[:, 1]
auc_sage_b = roc_auc_score(y[te_idx], p_sage_b[te_idx].numpy())
print(f"GraphSAGE (mini-batch node sampling) test AUC: {auc_sage_b:.3f}")
GraphSAGE (mini-batch node sampling) test AUC: 0.724

Mini-batch training matches full-batch to within sampling variance on this small graph.

31.9 Scalability in the pipeline sense: pandas to Spark

Building the graph is half the battle. Below is a pattern we use in practice.

  1. pandas for prototypes up to one or two million rows. NetworkX accepts edge lists directly; construction takes seconds.
  2. Polars for tens of millions of edges. It reads Parquet lazily, joins features fast, and emits edge lists as Arrow tables for PyG.
  3. Dask/Spark for hundreds of millions to billions. Use Dask-GraphFrames or PySpark’s GraphFrames package for neighbor aggregations, Laplacian eigenmaps via spectral methods, and path counts. For downstream model training, dump sampled subgraphs to Parquet, then fan out mini-batches on GPU workers.
  4. DGL + Spark integration: DGL ships a distributed graph-store that ingests Spark DataFrames. This is the typical production stack at large banks.

Keep feature engineering upstream in Spark or Polars. Keep training downstream in PyG or DGL. Do not try to train from Spark directly; the throughput is not there.

31.10 Deployment

A GNN in production differs from a tabular model in a couple of ways that touch SR 11-7 and MLOps directly.

Score a new borrower. For a transductive model like GCN, a naive design forces retraining each time a new node appears. That is impractical. Two options:

  1. Use an inductive model (GraphSAGE, GAT) that accepts novel nodes and their local neighborhoods at inference.
  2. Precompute embeddings for the entire graph nightly via batch training and serve scores from a feature store. For new borrowers without a neighborhood, start with a neighborhood-free fallback (scorecard or logistic regression) and graduate to the GNN score once the borrower’s edges materialize (first invoice, first payment, first loan).

Serving. A minimal FastAPI endpoint takes a node ID, fetches a 2-hop neighborhood from a feature store or a graph database (Neo4j, Memgraph, JanusGraph), runs a forward pass, and returns the PD.

# Pseudo-code for a production endpoint. Not executed in this chapter
# but reproduces the pattern we use for GNN serving.
#
# from fastapi import FastAPI
# import torch
# from torch_geometric.data import Data
# from graph_store import fetch_subgraph
#
# app = FastAPI()
# model = torch.jit.load("gnn_sage.ts")
# model.eval()
#
# @app.post("/score")
# def score(firm_id: str):
#     x, ei = fetch_subgraph(firm_id, depth=2, max_neigh=25)
#     with torch.no_grad():
#         p = torch.softmax(model(x, ei), dim=-1)[0, 1].item()
#     return {"firm_id": firm_id, "pd_12m": p}

MLflow. Log the adjacency fingerprint (graph hash, number of nodes, number of edges) alongside the usual model parameters and metrics. Retrain triggers on either a data drift in features or a graph drift in structure.

ONNX. PyG models are exportable to ONNX with some care; SAGEConv and GCNConv need to be called with dense or static-shape edge indices because ONNX does not love dynamic graph sizes. Alternatives: TorchScript for JIT-compiled serving, or a hand-written message-passing kernel for inference if latency matters.

31.11 Regulatory considerations

A GNN used to drive credit decisions is a high-stakes ML system under SR 11-7 (Kipf & Welling, 2017 does not address this, but regulators have written extensively). The network dimension raises problems that tabular models do not.

Model risk (SR 11-7). The usual components, conceptual soundness, process verification, outcomes analysis, apply. Extra attention goes to:

  • Graph construction as data, not as model. The construction pipeline (which edges, what weights, how stale) is part of the data layer and must be version-controlled, reproducible, and monitored for drift. A shifting graph is a shifting input.
  • Training/test leakage. When nodes share edges, random splits leak. Use community-aware splits (hold out whole clusters), inductive splits (hold out whole time windows), or structured cross-validation. Report which.
  • Stability under adversarial perturbation. Small edge additions or deletions can flip predictions in some architectures; adversarial training or confidence calibration is appropriate for high-stakes decisions (Wu et al., 2021).

ECOA / Fair lending. Network features can proxy for protected attributes through homophily: people tend to connect with similar people. Using an applicant’s friends’ or neighbors’ credit outcomes can trigger proxy discrimination even if nothing in the model nominally references a protected class. Fair-lending review must test for disparate impact on the network-derived score as well as the combined score, and adverse action notices must explain graph-based reasons in natural language. This is what PGExplainer and GNNExplainer are for in a compliance workflow.

Basel II/III IRB. PDs produced by a GNN can feed IRB capital if the model has a track record, is validated, and the institution’s risk-governance function owns it. Basel does not forbid graph models; it forbids opaque models without validation and documentation. The institution must be able to reproduce the model end-to-end, explain its inputs, and demonstrate stability under stress. Network models also interact with Pillar 2 concentration-risk requirements: a supply-chain-aware PD that already prices in network exposures may alter the institution’s internal economic capital allocation in ways the capital framework assumes.

GDPR Article 22. Decisions based solely on automated processing, including profiling, that produce legal effects require the right to human review. Network models make the profiling question more salient because inputs include information about persons other than the subject. Ensure lawful basis for processing counterparties’ data and anonymize where possible.

EU AI Act. Credit scoring for natural persons is listed as high-risk. Requirements include risk-management system, data governance, documentation and logging, transparency and provision of information to users, human oversight, accuracy, robustness and cybersecurity. A GNN-based scorecard must document the graph construction (Annex IV of the Act), the training data and process, the explanations available to end users, and the cybersecurity posture, which for graphs includes resistance to adversarial-edge attacks.

31.12 Diagnostic: did the graph help?

A three-question checklist before deploying any GNN.

  1. Does a neighborhood-feature baseline beat the vanilla tabular baseline? Compute each node’s neighbor mean/max/min of each feature. Feed that into logistic regression. If this model already closes most of the GNN’s gap over tabular LR, a simple hand-crafted graph featurization is sufficient. The GNN adds complexity without model risk value.
  2. Do GCN, SAGE, and GAT agree in ordering? If they disagree wildly, the graph signal is weak or the architecture is dominant; prefer the simpler model.
  3. Does an explanation make business sense? Run GNNExplainer on a sample of ten true positives, ten false positives, and ten false negatives. A credit officer reviews. If the edges and features look arbitrary, the model is overfitting the graph.

We run the neighborhood-feature baseline on our synthetic problem.

Show code
und = G.to_undirected()
nbr_mean = np.zeros_like(X)
nbr_max = np.zeros_like(X)
for i in range(N):
    nb = list(und.neighbors(i))
    if nb:
        nbr_mean[i] = X[nb].mean(axis=0)
        nbr_max[i] = X[nb].max(axis=0)
    else:
        nbr_mean[i] = X[i]
        nbr_max[i] = X[i]

X_neigh = np.column_stack([X, nbr_mean, nbr_max]).astype(np.float32)
lr_n = LogisticRegression(max_iter=500, random_state=SEED).fit(
    X_neigh[tr_idx], y[tr_idx]
)
p_lrn = lr_n.predict_proba(X_neigh[te_idx])[:, 1]
auc_lrn = roc_auc_score(y[te_idx], p_lrn)

final = pd.DataFrame(
    {
        "model": [
            "LR tabular",
            "LR + neighbor-mean/max",
            "GCN",
            "GraphSAGE",
            "GAT",
        ],
        "test AUC": [auc_lr, auc_lrn, auc_gcn, auc_sage, auc_gat],
    }
)
final["test AUC"] = final["test AUC"].round(3)
final
model test AUC
0 LR tabular 0.538
1 LR + neighbor-mean/max 0.671
2 GCN 0.779
3 GraphSAGE 0.739
4 GAT 0.798

Logistic regression with hand-crafted neighbor means closes much of the gap to GCN on this graph. The GNN adds extra lift by learning which neighbor features matter and by composing 2-hop views, but the bulk of the gain is recoverable with simple aggregates. That is a powerful result for regulated environments: if 80% of the gain is in neighbor means, many banks will ship the simpler model.

31.13 Scorecard view

Regulated PDs have to map to a points scorecard. For a GNN score \(\hat{p}_i = \sigma(f(G, x_i))\), conversion to points is identical to tabular scores: \[ \operatorname{points}(i) = \operatorname{offset} + \operatorname{factor} \cdot \log\left( \frac{1-\hat{p}_i}{\hat{p}_i} \right), \tag{31.20}\] with standard choices \(\operatorname{offset} = 600\), base odds 50:1, PDO 20 (see Chapter 7 for the derivation). The quirks are two. First, \(\hat{p}_i\) at inference depends on the current graph; if graph drift is significant between scoring runs, the same applicant’s points can change without any change in their own features. Second, because the GNN learned on a frozen graph during training, very new borrowers may have few or no edges, and their score may collapse toward a prior. Handle via a fallback scorecard for applicants with degree below a threshold.

Show code
from creditutils import scorecard_points

pts_te = scorecard_points(p_gcn[te_idx].numpy())
print(f"GCN score points on test set: min {pts_te.min():.1f}, "
      f"mean {pts_te.mean():.1f}, max {pts_te.max():.1f}")
print(f"KS on test set: {ks_statistic(y[te_idx], p_gcn[te_idx].numpy()):.3f}")
GCN score points on test set: min 422.3, mean 532.3, max 661.5
KS on test set: 0.490

31.14 Vietnam and emerging markets

31.14.1 Market context

Vietnam is an unusually clean test case for graph-based credit scoring. The bureau (CIC) covers roughly half of the adult population (Credit Information Center of Vietnam, 2023), and the remaining half is thin-file or unbanked. At the same time, digital wallet penetration is high: MoMo, VNPay, ZaloPay, and ViettelPay collectively process a substantial share of retail payments. Each wallet operates a merchant-customer graph at national scale: every transaction is an edge, every merchant and every customer a node, and the adjacency matrix at quarter-end encodes a dense view of economic activity that no Vietnamese bureau captures. The same pattern holds in Indonesia with GoPay and OVO, in the Philippines with GCash, and in Kenya with M-Pesa, so the playbook travels beyond Vietnam.

For a lender, the attraction is information. A customer with no bureau tradeline but a year of consistent wallet payments to a set of merchants with stable repayment behavior is a scoreable customer under a GNN. A merchant with inconsistent payout patterns and a concentrated set of small-ticket customers is a different risk from a merchant with a diversified customer base. The tabular model misses both; the GNN captures both by message passing over the bipartite graph.

31.14.2 Application considerations

Three graph choices structure the Vietnamese pipeline. The first is the bipartite customer-merchant graph, with edges weighted by transaction volume and frequency. GraphSAGE handles this directly with two node types and the appropriate loss. The second is the customer-customer projection, with edges between customers who pay the same merchants within a window; this is a peer-similarity graph that supports fraud and default propagation signals but inherits homophily and fair-lending proxy risk. The third is the merchant-merchant projection, with edges between merchants who share customers; this is a supply-chain-adjacent graph that supports SME default scoring for the merchant side of the wallet.

Data access is the binding constraint. The wallet data sits with the wallet operator, not with the lender, and Decree 13/2023 personal data protection (Government of Vietnam, 2023) requires a legal basis for processing. The practical pattern is a bank-wallet partnership, with the wallet operator running the GNN on its own infrastructure and exporting only the node-level score to the bank. Decree 53/2022 (Government of Vietnam, 2022) adds a localization constraint, so the GNN training pipeline runs inside Vietnam. Decree 94/2025 on the controlled testing mechanism (Government of Vietnam, 2025) gives the sandbox path for fintech-bank partnerships.

31.14.3 Rationalization

The case for a wallet-graph GNN in Vietnam rests on the gap the CIC does not fill. A consumer loan decision for an urban customer with three years of bureau history does not need a graph; a decision for a rural first-time borrower with two years of wallet activity does. The SME case is parallel: a merchant with thin bureau coverage but strong wallet throughput is scoreable from the merchant-merchant graph even when the financial statement is unavailable. The Basel II/III validation burden (Basel Committee on Banking Supervision, 2017) applies as much to a Vietnamese GNN as to a US one, and the SBV’s Circular 41/2016 on capital adequacy ratios, as amended by Circular 22/2023/TT-NHNN (29 Dec 2023), requires the lender to document the model’s inputs and stability (State Bank of Vietnam, 2023).

31.14.4 Practical notes

Build the graph on a defined time window, typically 90 to 180 days, and refresh the graph quarterly. Use GraphSAGE as the default because new customers and new merchants join continuously; GCN requires a fixed graph and is the wrong inductive bias. Validate with community-aware splits, not random node splits, because payment-homophilous communities leak labels. Run the neighborhood-feature baseline first, because a Vietnamese lender that can deploy simple neighbor aggregates under SR 11-7 and SBV supervision will have an easier model risk conversation than a lender that ships a black-box GNN. Monitor for graph drift at the wallet-operator level; a product change in MoMo or VNPay that alters transaction categorization will shift the adjacency matrix and move scores for reasons unrelated to borrower behavior. Run a fair-lending audit over the graph-derived score by gender, urban-rural, and region, because homophily in the customer-customer projection creates proxy risk that the underlying wallet operator does not see. Finally, document the data-sharing agreement and the cross-border-transfer posture in the model card, because Decree 13/2023, Decree 53/2022, and the SBV will each read it.

31.15 What we did not cover

Heterogeneous GNNs (R-GCN, HAN, HGT) handle multiple node and edge types natively and are the right choice for borrower-firm-bank tripartite networks; we did not build one because the synthetic example is single-type. Dynamic or temporal GNNs (TGAT, EvolveGCN, ROLAND) are the appropriate abstraction for time-stamped transaction graphs; we deferred that to Chapter 36. Knowledge graph embeddings (TransE, RotatE, ComplEx) and random-walk methods (DeepWalk (Perozzi et al., 2014), node2vec (Grover & Leskovec, 2016)) deliver competitive results when the label signal is primarily structural and features are few.

31.16 Takeaways

  • Graph neural networks belong in the credit toolbox when the data-generating process is network-driven: supplier-buyer cascades, community-level shocks, interbank contagion, social-collateral lending.
  • GCN gives the strongest inductive bias and is the right first thing to try. GraphSAGE is the right production default because it handles new borrowers. GAT wins when neighbor weighting is task-specific (syndicated loans, concentrated counterparties).
  • Always compare against both a tabular baseline and a neighborhood-aggregate baseline. If hand-crafted neighbor means close most of the gap, ship the simpler model.
  • Contagion simulations exhibit sharp percolation thresholds. Modest changes in edge-level transmission probability produce non-linear loss-tail growth. Stress tests must treat the threshold, not additive shocks.
  • GNN explainability is model-local, not causal. GNNExplainer and PGExplainer are necessary but not sufficient for compliance; pair them with counterfactual tests and domain review.
  • Network features can proxy for protected attributes through homophily. Fair-lending review must cover graph-derived scores.

31.17 Further reading

The foundational GNN trio: GCN (Kipf & Welling, 2017), GraphSAGE (Hamilton et al., 2017), GAT (Veličković et al., 2018). Earlier work establishing message passing (Gilmer et al., 2017; Scarselli et al., 2009) and spectral graph convolutions (Bronstein et al., 2017; Defferrard et al., 2016). Survey articles that map the landscape (Wu et al., 2021).

On explainability for GNNs (Luo et al., 2020; Ying et al., 2019). On scalability, sampling, and distributed training (Chiang et al., 2019; Zeng et al., 2020).

For network credit risk and contagion (Acemoglu et al., 2015; Allen & Gale, 2000; Bardoscia et al., 2021; Battiston et al., 2012; Eisenberg & Noe, 2001; Elliott et al., 2014; Gai & Kapadia, 2010; Glasserman & Young, 2016). For the economic logic of supply-chain propagation (Acemoglu et al., 2012; Barrot & Sauvagnat, 2016; Carvalho et al., 2021). For empirical default clustering (Azizpour et al., 2018; Das et al., 2007; Duffie et al., 2009; Lando & Nielsen, 2010). For SME and guarantee networks (Cheng et al., 2019; Letizia & Lillo, 2019). For the social-collateral and peer-screening logic behind network-based consumer scoring (Björkegren & Grissen, 2020; Iyer et al., 2016; Karlan et al., 2009; Lin et al., 2013). For bank contagion evidence (Hale et al., 2020; Iyer & Peydro, 2011). For foundational centrality concepts (Freeman, 1977; Katz, 1953; Newman, 2003; Page et al., 1999).