Chapter 23: Knowledge Graphs

Graph Centrality Measures for Financial Risk and Feature Engineering

Degree, betweenness, and eigenvector centrality quantify structural importance in financial networks and serve as risk indicators and ML features that price-based data alone cannot provide.

Graph Centrality Measures for Financial Risk and Feature Engineering

Degree, betweenness, and eigenvector centrality quantify structural importance in financial networks and serve as risk indicators and ML features that price-based data alone cannot provide.

Why This Matters

Financial networks encode relationships that correlations and regressions cannot represent directly: who supplies whom, which institutions lend to each other, which directors sit on multiple boards, and which entities hold concentrated ownership positions. Graph centrality measures translate these topological structures into numerical scores that quantify how important, how connected, or how critical each node is within the network [ref:IMDV5XDV].

Graph-to-ML feature engineering typically organizes graph features into families: centrality, concentration, and crowding. This primer develops the specific centrality measures, their financial interpretations, and the conditions under which they produce reliable versus misleading signals.

The Core Centrality Measures

Consider a financial network $G = (V, E)$ where nodes $V$ represent entities (companies, institutions, securities) and edges $E$ represent relationships (supply-chain links, lending exposures, ownership stakes, board interlocks).

Degree Centrality

The simplest measure counts direct connections. For a node $i$ with degree $k_i$ in a network of $N$ nodes:

$$ C_D(i) = \frac{k_i}{N - 1} $$

In directed networks, in-degree and out-degree carry different meanings. In a supply-chain graph, high in-degree flags a company that depends on many suppliers. High out-degree flags a critical supplier node that many companies depend on. In ownership graphs, high degree signals a holding company or activist investor with positions across many firms [ref:IMDV5XDV].

Betweenness Centrality

Betweenness measures how often a node lies on shortest paths between other nodes:

$$ C_B(i) = \sum_{s \neq i \neq t} \frac{\sigma_{st}(i)}{\sigma_{st}} $$

where $\sigma_{st}$ is the total number of shortest paths from node $s$ to node $t$, and $\sigma_{st}(i)$ is the number of those paths that pass through node $i$. High betweenness identifies bottleneck entities whose disruption maximally fragments the network. In supply-chain stress testing, a node with high betweenness centrality represents a single point of failure: if that supplier is disrupted, the maximum number of buyer-to-source paths break [ref:IMDV5XDV].

Eigenvector Centrality

Eigenvector centrality captures recursive influence: a node is important if its neighbors are important. The centrality $x_i$ of node $i$ is proportional to the sum of the centralities of its neighbors:

$$ x_i = \frac{1}{\lambda} \sum_{j \in \mathcal{N}(i)} x_j $$

where $\lambda$ is the largest eigenvalue of the adjacency matrix $A$ and $\mathcal{N}(i)$ is the set of neighbors of node $i$. In matrix form, $Ax = \lambda x$, so the centrality vector is the leading eigenvector of the adjacency matrix.

In interbank lending networks, high eigenvector centrality identifies systemically important institutions: not just those with many connections, but those connected to other highly connected institutions. This recursive structure is what makes systemic risk difficult to assess from bilateral exposures alone [ref:XKME2FYI]. PageRank is a variant of eigenvector centrality that adds a damping factor and handles directed graphs, making it applicable to ownership and citation networks.

Closeness Centrality

Closeness measures how near a node is to all others:

$$ C_C(i) = \frac{N - 1}{\sum_{j \neq i} d(i, j)} $$

where $d(i, j)$ is the shortest path length between nodes $i$ and $j$. In information-propagation terms, high closeness identifies entities that receive or transmit shocks fastest through the network.

Financial Network Types

Centrality measures apply across several network types constructed from different data sources:

Network type	Nodes	Edges	Source	Centrality interpretation
Supply chain	Companies	SUPPLIES	10-K filings	Supplier criticality, dependence
Interbank lending	Institutions	LENDS_TO	Regulatory reports	Systemic importance, contagion
Ownership/control	Entities	HOLDS	13F filings	Concentration, control influence
Director interlock	Companies	SHARES_DIRECTOR	Proxy filings	Information flow, governance

Each network type has its own edge semantics, and the financial interpretation of centrality depends on those semantics. High betweenness in a supply chain means something different from high betweenness in a director-interlock network.

Worked Example

Consider a stylized supply-chain subgraph with five companies:

$ B /|\ A | D --- E \|/ C $

Company A sources from B and C. B and C supply each other (e.g., cross-licensing). Company D sources from B and C and supplies E. The graph has six edges: A--B, A--C, B--C, B--D, C--D, D--E.

Node	Degree	Betweenness	Interpretation
A	2	0	End buyer, no intermediary role
B	3	1	Intermediary on half the A-to-D and A-to-E shortest paths
C	3	1	Symmetric intermediary role to B
D	3	3	Highest betweenness: sole gateway from the B-C cluster to E
E	1	0	Terminal node, single supplier dependency

Node D has the highest betweenness because every path to E passes through D, and it also lies on the A--E shortest paths. Disrupting D isolates E entirely. This is the kind of structural risk signal that cannot be extracted from D's financial statements alone.

From Centrality to ML Features

To use centrality in supervised learning pipelines:

Construct temporal snapshots. Build the graph from filings available at each point in time. A supply-chain graph as of Q1 2024 uses only filings with filing dates before the end of Q1 2024.
Compute centrality per snapshot. Calculate degree, betweenness, and eigenvector centrality for each node in each snapshot.
Normalize within cross-sections. Rank or z-score centrality values within each snapshot to remove level effects from changing graph size over time.
First-difference for momentum. The change in a node's centrality between consecutive snapshots captures whether its structural importance is growing or declining.
Feed into tabular models. Centrality features enter gradient-boosting or other supervised models alongside price-based and fundamental features.

Temporal leakage risk: A supply-chain edge extracted from a 10-K filed on 2024-02-15 cannot be used for predictions made before that date. The filing date, not the fiscal year-end, determines when the information was available. Computing centrality from future filings is a direct form of lookahead bias [ref:IMDV5XDV].

Practical Guidance

Centrality on incomplete graphs reflects extraction coverage, not true topology. If your extraction pipeline captures 60% of supply-chain relationships, degree centrality will systematically underestimate the true connectedness of every node.
Entity-resolution errors distort centrality in both directions. Duplicate nodes for the same company split its edges, understating that entity's true centrality. Incorrectly merging distinct entities inflates degree. Fix entity resolution (Primer 01) before computing centrality.
Topological importance is not the same as economic importance. A node with high betweenness in the graph may represent a small company that happens to be the sole intermediary in a niche supply chain. Combine centrality with economic variables (revenue, market cap) for risk assessment.
Betweenness centrality is computationally expensive on large graphs ($O(VE)$ for unweighted graphs). For networks with tens of thousands of nodes, approximate algorithms or sampling-based estimates may be necessary.
Use directed centrality measures when edge direction is meaningful. In supply chains, the direction of the SUPPLIES edge matters: in-degree and out-degree tell different stories.

Where It Fits in ML4T

Chapter 23 introduces centrality as one family of graph-derived features and discusses how centrality, concentration, and crowding measures feed into ML pipelines. This primer develops the mathematical definitions, financial interpretations, and failure modes that allow readers to use these features correctly. The measures defined here serve as inputs to the dynamic graph features covered in Primer 06, and the supply-chain and ownership graphs from which centrality is computed connect to the schema and provenance design covered in Primer 02. The statistical network construction methods in Primer 05 provide an alternative path from correlation data to graph topology where centrality analysis also applies.

Register to Read

Create Free Account

Already have an account? Sign in

Chapter

23 Knowledge Graphs

More Primers

Statistical Financial Networks and Filtered Correlation Graphs