- ML3T - Machine Learning for Trading

3

Market Microstructure

1 primer

Tick-level predictability often reflects how trades print through the spread, not an exploitable signal about the efficient price.

4

Fundamental and Alternative Data

3 primers

Time-Valid Security Masters and Identifier Histories

An identifier match is only useful if it resolves the right object at the right time.

Vintage Macroeconomic Data and Release-Calendar Alignment

A macro series is not known when its reference period ends. It is known when the release becomes public, and revised again when later vintages arrive.

XBRL Fundamentals in Practice

XBRL is not just tagged accounting data. It is the grammar that determines what a filing fact means, when it applies, and whether it can be compared across firms and time.

5

Synthetic Financial Data

2 primers

Bootstrap Methods for Dependent Financial Time Series

Bootstrap paths are useful only if they preserve the dependence structure your downstream metric actually cares about.

Stochastic Volatility, Jumps, and GARCH as Financial Simulation Baselines

A simulation baseline is useful when you know exactly which stylized facts it can generate and which ones it cannot.

7

Defining the Learning Task

1 primer

The Information Coefficient in Practice: What the Numbers Actually Mean

An IC of 0.04 sounds tiny, but with 500 stocks rebalanced monthly the Fundamental Law implies an IR near 0.9. An IC of 0.08 sounds better, but if it flips sign every other fold it is worthless. Interpreting IC requires understanding why the metric is inherently small, when it misleads, and how the horizon profile reveals the signal's economic mechanism.

8

Financial Feature Engineering

4 primers

Carry, Basis, and Roll Yield Across Futures and Perpetuals

Carry features measure the term-structure conditions under which holding or rolling exposure may be favorable or costly. In derivatives, those conditions appear through spot basis, calendar roll, and perpetual funding.

Point-in-Time Feature Construction and Data Vintages

A feature is only valid for trading if it was knowable at the decision timestamp, not merely true in hindsight.

Range-Based Volatility Estimators from OHLC Data

High and low prices reveal intrabar dispersion that the closing price alone cannot, but each OHLC estimator is only better when its assumptions match the bar structure you actually trade.

Residualization, Peer Sets, and Relative-Value Features

Neutralization is not cosmetic cleanup. It changes the hypothesis about what should count as an opportunity.

9

Model-Based Feature Extraction

8 primers

Autoregressive, Moving-Average, and ARIMA Foundations for Feature Engineering

ARIMA is rarely the star predictor in liquid markets, but it is still one of the cleanest ways to separate level, persistence, shock, and forecast uncertainty before downstream models take over.

Bayesian Inference and MCMC for Time Series

A Bayesian time-series model produces a posterior distribution, not just a fitted line, which is why posterior uncertainty can itself become a feature.

Fractional Differencing and Long Memory in Financial Features

Fractional differencing is easy to apply but harder to understand well. This primer covers the operator algebra, asymptotic weight decay, and the precise sense in which the transform preserves low-frequency dependence.

Path Signatures and Log-Signatures for Financial Sequences

Path signatures encode the ordered geometry of multivariate sequences through iterated integrals. This primer covers the algebra, Chen-style composition, and the embedding choices that decide whether the construction carries real information in finance.

State-Space Models and the Kalman Filter

Kalman filter outputs are widely used as trading features. This primer covers the deeper machinery underneath them: the innovation representation, Riccati recursion, and identification choices that determine what those features actually mean.

Structural Break Diagnostics and Time-Since-Break Features

A break test is not asking whether the series is "bad." It is asking whether one stable model is still a reasonable description of the whole sample.

Uncertainty as a Feature: Stochastic Volatility, Forecast Intervals, and Forecast Uncertainty

In trading, two models with the same point forecast are not equivalent if one is much less certain than the other.

Wavelets for Multi-Scale Diagnostics and Causal Feature Design

Wavelets are often best used to discover where the signal lives, then translated into safer causal proxies, rather than deployed naively as production features.

10

Text Feature Engineering

3 primers

Coverage-Aware Evaluation and Event-Time Alignment for Text Signals

A text model is not useful because it predicts labels accurately. It is useful only if its signal is available when you trade, on enough names, at the horizon that matters.

Long-Document Encoding for Filings and Transcripts

For long financial documents, the first design decision is not the model. It is how much context you can afford to preserve without mixing together information that arrives or matters at different times.

When Long-Context Encoders Are Worth the Cost

The decision between chunking and full-context encoding is a cost-accuracy tradeoff governed by document structure and task type -- principles that outlast any specific architecture generation.

11

The ML Pipeline

2 primers

Classical Statistical Tests as Linear Models: OLS, t-Tests, ANOVA, and Correlation

Many "different" statistical tests are the same linear-model object wearing different notation. Once you see the shared design-matrix view, the jump from classical inference to predictive regularization is much less mysterious.

Loss Functions, Error Metrics, and What They Hide

A model is trained to optimize one quantity, selected on another, and traded on a third. Most confusion in predictive modeling starts when those three layers are blurred together.

12

Advanced Models for Tabular Data

2 primers

Bayesian Hyperparameter Optimization Under Temporal Dependence

Hyperparameter search is part of the statistical design, not a software convenience layer.

Leakage-Safe Categorical Encoding for Financial ML

Categorical encoding becomes dangerous when a feature value quietly contains information from the target you are trying to predict.

13

Deep Learning for Time Series

3 primers

Making Transformers Time-Aware

A vanilla Transformer is good at flexible token interaction. Time-series forecasting needs more than that: it needs temporal and structural inductive bias.

State Space Models: From Kalman Intuition to Mamba

State space models compress the past into a latent state that is updated recursively, turning long-context sequence processing from a quadratic attention problem into a controlled linear dynamical system — and selective variants like Mamba let the model decide which inputs deserve to update that memory and which should be forgotten.

Uncertainty Estimation and Calibration for Deep Time-Series Models

A forecasting model is not uncertainty-aware because it emits a variance. It is uncertainty-aware only if that variance tracks future error under the validation protocol you actually trade.

14

Latent Factor Models

4 primers

CAPM, APT, and Fama-French: From Beta to Multifactor Pricing

Asset-pricing models all ask the same question: which systematic risks deserve expected return? CAPM gives one answer, APT opens the door to many, and Fama-French turns that logic into an empirical benchmark family.

Inelastic Markets Hypothesis and Flow-Driven Prices

If demand curves for risky assets slope downward rather than staying flat, flows can move prices in persistent ways. That turns "who has to trade?" into part of the asset-pricing problem.

Random Matrix Theory for PCA in Finance

PCA always returns components. The question is whether those components reflect latent economic structure or the noise geometry of a high-dimensional covariance estimate. Random matrix theory provides the benchmark for answering that question.

Stochastic Discount Factors, No-Arbitrage Moments, and HJ Distance

A stochastic discount factor is the object that prices everything at once. If it fails, the failure shows up as a portfolio the model misprices.

15

Causal Machine Learning

1 primer

Interference, Spillovers, and SUTVA Violations in Financial Markets

In markets, one unit's treatment rarely stays politely confined to that unit.

16

Strategy Simulation

3 primers

Sharpe Ratio Under Autocorrelation and Non-Normal Returns

The Sharpe ratio is only easy to annualize and compare when returns behave far more cleanly than trading strategies usually do.

The Sharpe Ratio

The Sharpe ratio is the default language for comparing risk-adjusted performance, and most practitioners use it without understanding how noisy, fragile, and assumption-laden it really is.

White's Reality Check and Bootstrap Inference for Strategy Families

White's Reality Check asks a family-level question: after searching across many variants, is there evidence that any strategy truly beats the benchmark?

17

Portfolio Construction

4 primers

Benchmark-Relative Portfolio Evaluation: Tracking Error, Information Ratio, and Active Share

Once a benchmark exists, Sharpe ratio stops answering the whole question.

Covariance Shrinkage for Portfolio Allocation

Mean-variance portfolios fail less often when the covariance matrix is regularized before it is inverted.

Estimation Error and the Markowitz Curse

Mean-variance optimization is not fragile because the quadratic program is hard; it is fragile because the optimizer is asked to invert noisy beliefs about returns and covariances.

Kelly Criterion and Fractional Kelly for Multi-Asset Portfolios

Kelly sizing maximizes long-run log growth, but the full-Kelly solution is usually too fragile to estimated inputs to be deployed without a haircut.

18

Transaction Costs

2 primers

Almgren-Chriss Optimal Execution

Almgren-Chriss matters because execution is never just a cost problem. It is always a cost versus risk problem.

Square-Root Market Impact and Participation-Based Cost Models

The square-root rule matters because market impact grows slower than linearly with size, but still fast enough to kill many strategies.

19

Risk Management

3 primers

Drift Detection and Trigger Design

A risk system that cannot detect when its own inputs have shifted is a system waiting to be surprised -- and the hardest part is not detecting drift but deciding what to do about it.

Stress Testing and Reverse Stress Testing for Systematic Portfolios

Forward stress testing asks "how bad does it get in this scenario?" Reverse stress testing asks "what scenario breaks us?" -- and the second question is usually more useful for a systematic portfolio.

Volatility Forecasting for Risk Control: EWMA, GARCH, QLIKE, and Proxy-Robust Evaluation

Returns are hard to forecast. Risk is not easy either, but volatility is one of the few market objects that is forecastable enough to run real controls on.

20

Strategy Synthesis

2 primers

From Model Scores to Portfolio Weights

Portfolio construction is the decision rule that maps model outputs, risk estimates, current holdings, and constraints into target weights — and every design choice in that mapping can amplify, dampen, or invert the signal's intended direction.

Instrument-Appropriate Transaction Cost Models

A single basis-point cost assumption applied uniformly across asset classes will either kill viable strategies or greenlight doomed ones -- cross-asset cost synthesis requires instrument-specific models.

21

Reinforcement Learning

2 primers

Distributional RL and Risk Measures

Distributional RL learns the full return distribution rather than its mean, enabling risk-sensitive policies that align with how execution and hedging desks actually measure performance.

Policy Gradient Theorem and Actor-Critic Architectures

Policy gradient methods optimize parameterized policies directly, enabling the continuous action spaces and stochastic behaviors that execution and hedging demand.

22

RAG for Financial Research

1 primer

Point-in-Time Integrity for Document AI and Financial RAG

In finance, the right document is not just the relevant one. It is the relevant one that was actually available at the historical decision time.

23

Knowledge Graphs

2 primers

Graph Centrality Measures for Financial Risk and Feature Engineering

Degree, betweenness, and eigenvector centrality quantify structural importance in financial networks and serve as risk indicators and ML features that price-based data alone cannot provide.

Statistical Financial Networks and Filtered Correlation Graphs

The Mantegna pipeline converts a noisy correlation matrix into a distance metric and extracts a minimum spanning tree that reveals market structure, sector relationships, and crisis dynamics that sector labels alone do not capture.

24

Autonomous Agents

1 primer

Proper Scoring Rules for Financial Event Forecasts

A scoring rule is proper when it rewards honest probability assessments and penalizes hedging -- the mathematical foundation for evaluating any agent or model that outputs probabilities over financial events.

25

Live Trading Systems

1 primer

Event-Driven Architecture and Deterministic Strategy Design

If the same strategy cannot behave the same way under replay and live events, the backtest and the production system are not really the same system.

26

MLOps and Governance

1 primer

Training-Serving Skew, Point-in-Time Joins, and Feature Stores

Many live model failures look like alpha decay until you discover that training and inference never computed the same feature in the first place.

ML Primer

Continue Learning