3rd Edition

The ML4T Primer

Foundations the book builds on and deeper technical treatments it references — standalone companion topics organized by chapter.

112

Topics

Foundational

Intermediate

Advanced

Cross-Chapter

Concepts referenced throughout the book

Foundational building blocks

ACF and PACF Interpretation foundational

How total and direct lag dependence leave different signatures in time-series plots, and why those signatures are useful but never definitive.

Bayes' Theorem and Posterior Distributions foundational

Bayesian inference updates a probability distribution over unknown quantities rather than replacing uncertainty with a single estimate.

Causality, Confounding, and Why Good Signals Can Be Misleading foundational

A predictive relationship can be real in the data and still fail as an explanation of what would happen under intervention.

Covariance Matrices, Estimation, and Why They Break foundational

How a covariance matrix summarizes co-movement, why the sample version becomes unstable in high dimensions, and why shrinkage is the default repair.

Eigenvalues, Eigenvectors, and the Geometry of Covariance foundational

How symmetric matrices reveal natural directions of variation, and why that matters for PCA and statistical risk factors.

Hypothesis Testing and P-Values foundational

How hypothesis tests turn noisy evidence into a structured decision, and how to read p-values without treating them as proof.

Markov Chains and the Markov Property foundational

How a state representation can compress history into current conditions, and why that matters for regime models and decision processes.

Momentum and Mean Reversion foundational

How return predictability changes with horizon, how cross-sectional and time-series momentum differ, and why the same signal that works for months can fail violently in a rebound.

Multiple Testing and the Researcher’s Trap foundational

Why searching many ideas makes false discoveries and overstated winners inevitable.

Point-in-Time Data and Decision-Time Correctness foundational

A value is usable only if it was actually knowable when the strategy had to decide.

Sharpe Ratio: Definition, Annualization, and Estimation Noise foundational

What the Sharpe ratio measures, when the usual scaling works, and why a backtest Sharpe is noisier than it looks.

Simple Returns vs Log Returns foundational

One aggregates exactly across assets, the other aggregates exactly across time. Most mistakes come from asking one definition to do both jobs.

Stationarity and Unit Roots foundational

Why time-series stability matters, why random walks mislead regressions, and what differencing fixes and destroys.

Stylized Facts of Financial Time Series for Simulation and Validation foundational

A synthetic market path is only useful if it reproduces the empirical pathologies that make financial returns hard to model in the first place.

The Bias-Variance Tradeoff foundational

Why a model that is deliberately a little wrong can generalize better than one that fits the past too closely.

The Information Coefficient foundational

How cross-sectional rank correlation measures signal quality, and why a small edge can still matter when it is applied repeatedly.

Trading Costs: Spread, Slippage, and Market Impact foundational

How execution costs arise, why the components are different, and how turnover turns a predictive signal into a net strategy.

Training Neural Networks foundational

How forward passes, losses, backpropagation, optimization, and regularization work together so the architecture

Volatility: Realized, Implied, and Why It Clusters foundational

Volatility is not one number but a family of related objects: what happened, what the options market prices, and what a model forecasts next.

Walk-Forward Validation for Time Series foundational

Why model evaluation must preserve temporal order, and how expanding or rolling splits approximate live deployment.

Part 1

Introduction & Financial Data

Part 2

Research Design and Feature Engineering

Defining the Learning Task

Label Overlap: Why Your Sample Is Smaller Than You Think intermediate

When labels share future price paths, nominal sample counts exaggerate the evidence available for inference — often by an order of magnitude. Diagnosing overlap before interpreting results is not optional; it is the difference between a credible signal evaluation and a statistical illusion.

Multiple Testing in Factor Research: The Search Tax on Discovery intermediate

Every variant you try without recording it borrows from the credibility of the winner. The statistical correction is straightforward; the organizational discipline of tracking what you searched is harder and more important.

Reading the Information Coefficient: Stability, ICIR, and Horizon Decay intermediate

IC is a learnability screen for continuous labels, not a compressed backtest.

Financial Feature Engineering

Carry, Basis, and Roll Yield Across Futures and Perpetuals intermediate

Carry features measure the term-structure conditions under which holding or rolling exposure may be favorable or costly. In derivatives, those conditions appear through spot basis, calendar roll, and perpetual funding.

Point-in-Time Feature Construction and Data Vintages intermediate

A slow-moving feature is admissible only from the moment your strategy could actually have known it.

Range-Based Volatility Estimators from OHLC Data intermediate

High and low prices reveal intrabar dispersion that the closing price alone cannot, but each OHLC estimator is only better when its assumptions match the bar structure you actually trade.

Model-Based Feature Extraction

Autoregressive, Moving-Average, and ARIMA Foundations for Feature Engineering intermediate

ARIMA is rarely the star predictor in liquid markets, but it is still one of the cleanest ways to separate level, persistence, shock, and forecast uncertainty before downstream models take over.

Bayesian Inference and MCMC for Time Series advanced

A Bayesian time-series model produces a posterior distribution, not just a fitted line, which is why posterior uncertainty can itself become a feature.

Fractional Differencing: Keeping Memory Without Keeping the Unit Root advanced

How to turn differencing from a blunt preprocessing step into a tunable filter that trades stationarity against memory retention.

Path Signatures and Log-Signatures for Financial Sequences advanced

Two windows can end at the same price with similar volatility and still trace different paths; signatures are one way to encode that ordered shape.

Regime Models for Feature Engineering: HMMs, Markov Switching, and Distributional Clustering advanced

Chapter 9 already makes the operational point about filtered regime probabilities. This primer narrows to the more technical layer: the filtering recursion, state uncertainty, and the identification problems that make regime models easy to misuse.

State-Space Models and Kalman Filtering for Feature Engineering advanced

How linear Gaussian state-space models turn noisy time series into point-in-time features such as level, trend, innovation, and uncertainty.

Stationarity Tests: ADF, KPSS, and Rolling Stability Signals intermediate

These tests are useful in trading because they summarize how stable a series looks right now, not because they can certify once and for all that a process is stationary.

Structural Break Diagnostics and Time-Since-Break Features intermediate

How to tell whether one stable model stopped fitting the series, and how to turn that evidence into live-safe features.

Uncertainty as a Feature advanced

Why two identical forecasts can imply very different decisions once you look at what the model does not know.

Volatility Models as Feature Extractors: GARCH, EGARCH, and HAR intermediate

Chapter 9 already teaches what volatility-model outputs to extract. This primer narrows to the harder layer underneath that recipe: persistence geometry, parameter interpretation, and how to tell when the fitted risk state is statistically meaningful rather than just mechanically smooth.

Wavelets for Multi-Scale Diagnostics and Causal Feature Design advanced

Wavelets are most useful in ML4T when they reveal which horizon matters and then disappear behind a trailing, auditable feature.

Text Feature Engineering

Coverage-Aware Evaluation and Event-Time Alignment for Text Signals intermediate

A text model is not useful because it predicts labels accurately. It is useful only if its signal is available when you trade, on enough names, at the horizon that matters.

Domain Adaptation vs. Task Fine-Tuning in Financial NLP intermediate

In financial NLP, the failure mode is often not the architecture but starting from the wrong checkpoint for the wrong stage.

Self-Attention and Contextual Embeddings intermediate

Self-attention lets the same token mean different things in different sentences because its representation is rebuilt from context instead of looked up once from a static table.

When Long-Context Encoders Earn Their Keep intermediate

Use full-document encoding only when the label depends on interactions between distant parts of the document; otherwise chunking is usually the better default.

Part 3

Model Development

The ML Pipeline

Classical Statistical Tests as Linear Models intermediate

OLS, t-tests, ANOVA, and Pearson correlation are not separate islands; for the cases covered here, they are linear models with different design matrices and different coefficient restrictions.

Conformal Prediction in Finance: Coverage, Exchangeability, and Drift advanced

Conformal prediction is attractive in finance because it gives finite-sample coverage without distributional assumptions, but its guarantee is only as good as the exchangeability you are willing to believe.

Loss Functions, Error Metrics, and What They Hide intermediate

A model is trained to optimize one quantity, selected on another, and traded on a third. Most confusion in predictive modeling starts when those three layers are blurred together.

Regularization Geometry: How Ridge, LASSO, and Elastic Net Actually Work intermediate

Regularization helps not by fitting the training sample better, but by refusing to trust unstable coefficient estimates — and the SVD of the feature matrix reveals exactly which directions it distrusts and why.

Selection Bias in Model Tuning: Why Your Best Validation Score Lies intermediate

Even with perfect chronological splits and no data leakage, repeated hyperparameter search overfits the validation set — and the winning score systematically overstates the performance you should expect out of sample.

Advanced Models for Tabular Data

Bayesian Hyperparameter Optimization Under Temporal Dependence intermediate

Hyperparameter search is part of the statistical design, not a software convenience layer.

Leakage-Safe Categorical Encoding for Financial ML intermediate

Categorical encoding becomes dangerous when a feature value quietly contains information from the target you are trying to predict.

Deep Learning for Time Series

State Space Models from Kalman Intuition to Mamba intermediate

A latent-state view of long-context sequence models, and what "selective" memory actually changes.

Uncertainty Estimation and Calibration for Deep Time-Series Models intermediate

A forecasting model is not uncertainty-aware because it emits a variance. It is uncertainty-aware only if that variance tracks future error under the validation protocol you actually trade.

Latent Factor Models

CAPM, APT, and Fama-French: From Beta to Multifactor Pricing intermediate

How classical asset-pricing models separate exposure from compensation, and why that distinction still frames latent factor methods.

Conditional Factor Structure: Why Characteristics Predict Loadings advanced

When firm characteristics predict factor loadings, the covariance matrix of returns is itself a function of observables -- and that conditional structure is what IPCA exploits and what static PCA ignores.

Inelastic Markets and Flow-Driven Prices advanced

Prices move not only because beliefs about cash flows change, but also because somebody has to absorb somebody else's trade.

Multiple Testing, Replication, and the Factor Zoo After the Replication Wars intermediate

The factor zoo is not just a story about too many predictors. It is a story about search, construction choices, and the gap between a published anomaly and an investable factor.

Random Matrix Theory for PCA in Finance advanced

In a high-dimensional return panel, a large eigenvalue may reflect latent structure, or it may just be the geometry of estimation noise.

Stochastic Discount Factors, No-Arbitrage Moments, and HJ Distance intermediate

A stochastic discount factor is the object that prices everything at once. If it fails, the failure shows up as a portfolio the model misprices.

Why Variance Rank Is Not Pricing Rank: The Math Behind RP-PCA advanced

The eigenvalue of a factor measures how much return variance it explains. Its risk premium measures how much expected return it commands. These are different numbers, and the gap between them is why standard PCA can miss the factors that matter most for asset pricing.

Causal Machine Learning

Backdoor Adjustment and Control Selection in Causal DAGs intermediate

Before you estimate a treatment effect, you need to know which variables identify it and which ones poison it.

Interference, Spillovers, and SUTVA Violations in Financial Markets intermediate

In markets, one unit's treatment rarely stays politely confined to that unit.

Potential Outcomes and the Rubin Causal Model intermediate

Causal inference starts with a missing-data problem: for each unit, the outcome you most want to compare is the one you never get to observe.

Sensitivity Analysis, Placebos, and Negative Controls in Observational Finance advanced

A causal estimate becomes usable only after you ask how it could be wrong.

What the BSTS Counterfactual Actually Learns intermediate

A BSTS event study builds a synthetic no-intervention world from two ingredients: the target series' own dynamics and its pre-event co-movement with controls. Understanding exactly what each component contributes -- and what breaks each one -- is essential for credible causal claims.

Why Orthogonalization Works: The Mechanism Behind Double Machine Learning advanced

DML does not succeed because it runs two models instead of one. It succeeds because the Neyman-orthogonal score is locally insensitive to first-order errors in the nuisance estimates, and cross-fitting breaks the dependence between estimation error and score evaluation.

Part 4

Strategy Implementation

Strategy Simulation

Deflated Sharpe Ratio and Search-Aware Backtest Inference advanced

Why the best backtest from a large search must clear a higher statistical bar than a pre-specified strategy.

Sharpe Ratio: Definition, Annualization, and Estimation Noise foundational

What the Sharpe ratio measures, when the usual scaling works, and why a backtest Sharpe is noisier than it looks.

Sharpe Ratio Under Autocorrelation and Non-Normal Returns advanced

How serial dependence and asymmetric tails change Sharpe annualization, sampling uncertainty, and the strength of the evidence in a backtest.

Turnover as a Cost-Survival Diagnostic intermediate

How turnover converts gross edge into cost burden, and how to compute a quick break-even frontier before doing full transaction-cost analysis.

White's Reality Check and SPA Bootstrap Inference for Strategy Families intermediate

How to test whether the best strategy found in a search beats a benchmark once the whole search process is counted.

Why Identical Weights Can Mean Different Strategies intermediate

A backtest is a time-indexed state transition, not a weight vector multiplied by returns.

Portfolio Construction

Black-Litterman and Views-Based Allocation advanced

Black-Litterman starts from the market's implied equilibrium returns, blends in investor views weighted by their confidence, and produces a posterior expected-return vector that is far more stable than raw sample estimates fed to a mean-variance optimizer.

Covariance Shrinkage for Portfolio Allocation intermediate

Mean-variance portfolios fail less often when the covariance matrix is regularized before it is inverted.

Estimation Error and the Markowitz Curse intermediate

Mean-variance optimization is not fragile because the quadratic program is hard; it is fragile because the optimizer is asked to invert noisy beliefs about returns and covariances.

Factor-Mimicking Portfolios advanced

A factor view becomes tradable only after you specify how to load on the target exposure while neutralizing the risks you do not want to own accidentally.

Hedging Under Parameter Uncertainty advanced

A perfect hedge ratio estimated imperfectly is often worse than a conservative hedge ratio estimated honestly.

Hierarchical Risk Parity: Clustering, Quasi-Diagonalization, and Recursive Bisection advanced

HRP replaces covariance-matrix inversion with a tree-based allocation that is more stable out of sample, at the cost of a long-only constraint and sensitivity to clustering choices.

Kelly Criterion and Fractional Kelly for Multi-Asset Portfolios intermediate

Kelly sizing maximizes long-run log growth, but the full-Kelly solution is usually too fragile to estimated inputs to be deployed without a haircut.

Risk Contribution, Risk Parity, and Why Capital Weights Mislead advanced

"Diversification is the only free lunch in finance" — but a portfolio that spreads capital equally can still concentrate nearly all its risk in one or two positions.

Transaction Costs

Almgren-Chriss Optimal Execution advanced

Almgren-Chriss matters because execution is never just a cost problem. It is always a cost versus risk problem.

Implementation Shortfall and Transaction Cost Analysis intermediate

"The total cost of trading is the difference between the initial book value and the capture." -- Almgren and Chriss (2001)

Square-Root Market Impact and Participation-Based Cost Models advanced

"Markets self-organize into a critical state where liquidity vanishes linearly near the current price." -- Toth et al. (2011)

Risk Management

Drift Detection and Trigger Design intermediate

A drift monitor is useful only when you know what statistic it watches, what kind of change it can see, and how many false alarms you are willing to tolerate.

Forecast Evaluation with Noisy Volatility Proxies intermediate

You cannot observe true volatility, so every forecast evaluation is really a comparison against a noisy proxy — and the wrong loss function will rank models incorrectly.

Stress Testing and Reverse Stress Testing for Systematic Portfolios intermediate

Forward stress tests ask how much a named scenario hurts; reverse stress tests ask what smallest plausible scenario breaks the portfolio.

Tail-Risk Estimation Under Finite Samples intermediate

The deeper the tail you want to measure, the fewer observations you actually have to measure it.

Value-at-Risk, CVaR, and Expected Shortfall for Systematic Strategies intermediate

VaR tells you where the tail begins; CVaR tells you how bad losses are once you are already in the tail.

Volatility Forecasting Mechanics for Risk Control intermediate

Volatility is the one financial quantity that is genuinely forecastable at short horizons, and every risk overlay — from position sizing to drawdown limits — depends on getting that forecast roughly right.

Strategy Synthesis

From Information Coefficient to Information Ratio intermediate

"It takes only a modest amount of skill to win as long as that skill is deployed frequently and across a large number of stocks." -- Grinold and Kahn (1999)

Holdout Decay and Failure-Mode Diagnosis intermediate

"Does academic research destroy stock return predictability?" -- McLean and Pontiff (2016)

Instrument-Appropriate Transaction Cost Models intermediate

The first question in transaction-cost modeling is not “how many basis points?” but “basis points of what?”

Portfolio Construction as the Translation Layer intermediate

The same prediction vector can become three different portfolios, and Chapter 20 shows that this translation step can determine which model actually wins.

Part 5

Advanced AI

Reinforcement Learning

Distributional RL and Tail-Aware Action Selection advanced

Learn the return distribution, not just its mean, so execution and hedging policies can be chosen by tail risk as well as average performance.

Markov Decision Processes and Partial Observability intermediate

"Algorithm quality cannot rescue a poorly specified state/action/reward design." -- Chapter 21

Policy Gradient Theorem and Actor-Critic Architectures advanced

How direct policy optimization turns delayed, noisy rewards into learning signals for continuous trading actions.

Reward Shaping and Expected Utility Theory intermediate

"Reward hacking and misalignment are the dominant failure modes." -- Chapter 21

Temporal-Difference Learning and Bellman Equations intermediate

"TD methods are the most central and novel idea in reinforcement learning." -- Sutton and Barto (2018)

Knowledge Graphs

Graph Centrality as a Structural Risk Signal intermediate

Degree, betweenness, eigenvector, and closeness centrality describe different kinds of network importance; in finance they become useful features only when edge meaning, graph coverage, and timing are explicit.

Statistical Financial Networks and Filtered Correlation Graphs advanced

How a dense correlation matrix becomes a sparse market map, what the minimum spanning tree keeps, and why the PMFG is

Autonomous Agents

Proper Scoring Rules for Financial Event Forecasts intermediate

How to evaluate agent-generated probabilities so that honesty, calibration, and useful discrimination all show up in the score.

Part 6

Production

MLOps and Governance

Champion-Challenger Evaluation and Shadow Mode in Trading Systems intermediate

"When comparing two strategies' Sharpes, prioritize dependence-aware bootstrap tests." -- Ledoit and Wolf (2008)

Training-Serving Skew, Point-in-Time Joins, and Feature Stores intermediate

"ML systems have all of the maintenance problems of traditional code plus an additional set of ML-specific issues." -- Sculley et al. (2015)