3rd Edition

9 Real-World Case Studies

Data-driven trading strategies across 7 asset classes, from ETFs to crypto, with complete ML4T workflow implementations.

Case Studies

Asset Classes

168

Notebooks

Pipeline Stages

ETFs

ETF Cross-Asset Exposures

Daily · Price Data · 114 notebooks

This case study applies the complete ML4T workflow to 100 exchange-traded funds covering equities, fixed income, commodities, currencies, and real estate. ETFs provide standardized pricing, deep liquidity, and broad asset-class coverage, making them an ideal laboratory for learning the end-to-end …

Linear models (Ridge, LASSO, ElasticNet) LightGBM ranking TabM (tabular deep learning) LSTM and TSMixer

Cryptocurrency

Crypto Perpetuals Funding

8-Hour · Alternative Data · 108 notebooks

This case study explores a structural feature unique to crypto markets: the funding rate mechanism in perpetual futures contracts. Every 8 hours, longs and shorts exchange payments based on the gap between perpetual and spot prices. The question is whether …

Gradient boosting LSTM for temporal patterns Time-series feature engineering Alternative data integration

Equities

NASDAQ-100 Microstructure

15-Minute · Microstructure · 108 notebooks

This is the highest-frequency case study in the book, using AlgoSeek TAQ-derived 15-minute bars for 114 NASDAQ-100 constituents. Students learn to build features from order flow, quote staleness, relative spreads, and other microstructure indicators — the richest feature space in …

Order flow features Microstructure indicators Classification vs regression labels GBM with large-scale data

Options

S&P 500 Equity + Option Analytics

Daily · Price Data · 108 notebooks

This case study uses options-derived signals to predict equity returns — not to trade options directly. Implied volatility surfaces, skew measurements, and term structure features from the S&P 500 options market are combined with standard equity features to predict 5-day …

IV surface feature engineering Multi-source feature integration Deep learning (CAE, NLinear) Causal DML for confounding analysis

Fundamentals

US Firm Characteristics

Monthly · Fundamental Data · 114 notebooks

This case study applies ML to the canonical factor investing question: can machine learning improve on traditional long-short decile sorts when accounting lags, survivorship bias, and transaction costs are taken seriously? Working with 57 firm-level characteristics spanning valuation, profitability, momentum, …

GBM classification Latent factor models (IPCA, CAE, SAE, SDF) Label engineering (classification vs regression) Cross-sectional prediction

Foreign Exchange

FX Spot Pairs

Daily · Price Data · 102 notebooks

This case study applies the ML4T workflow to 20 G10 currency pairs using daily data from OANDA. Foreign exchange presents a structurally challenging prediction problem: the cross-section is small (20 pairs dominated by a single USD factor), limiting diversification and …

Carry factor construction Momentum signals across horizons Causal DML for confounding Horizon-sensitivity analysis

Futures

CME Futures

Daily · Price Data · 114 notebooks

This case study uses daily data from Databento for 30 CME futures products across 7 sectors — equity indices, treasuries, energy, metals, currencies, agriculture, and livestock. Futures have a unique return decomposition (spot return plus roll yield), natural sector groupings, …

Carry factor from term structure Continuous contract construction GBM with sector features Data quality sensitivity analysis

Options

S&P 500 Options (Straddles)

Daily · Price Data · 114 notebooks

Unlike the equity+options case study that uses options data to predict stocks, this case study trades options directly. It sells ATM straddles on S&P 500 constituents and delta-hedges daily, testing whether the variance risk premium — the persistent gap between …

IV-derived feature engineering Delta-hedged return labels Options-specific cost modeling Multi-label backtesting

Equities

US Equities Panel

Daily · Price Data · 126 notebooks

This is the broadest cross-sectional equity workflow in the book, using daily data for approximately 3,200 US stocks spanning 2000-2018. The case study tests whether individually weak per-stock signals become useful when scaled across thousands of names — the Fundamental …

Panel data prediction PCA and IPCA (latent factors) Cross-sectional feature engineering Era-dependent cost modeling

Ready to Build Your Own Strategy?

Learn the complete ML4T workflow through our 27 chapters and 5 production libraries.

Explore Chapters View Libraries