Methodology Highlight
Broadest model comparison in the book — all six families trained on the same universe under identical walk-forward evaluation, showing how architecture choice interacts with signal structure.

This case study applies the complete ML4T workflow to 100 exchange-traded funds covering equities, fixed income, commodities, currencies, and real estate. ETFs provide standardized pricing, deep liquidity, and broad asset-class coverage, making them an ideal laboratory for learning the end-to-end pipeline.
The ETF universe serves as the broadest model family comparison in the book. All six model families are trained and evaluated under identical conditions: linear models, gradient boosting, tabular deep learning, sequential deep learning, latent factor models, and causal inference. Monthly rebalancing with walk-forward validation across 8 folds provides the evaluation framework.
Students learn to build cross-asset features including momentum, volatility, regime indicators, and intermarket lead-lag signals. The case study demonstrates how different model architectures capture different aspects of the same signal, and how portfolio construction choices interact with prediction quality.

Strategy Summary

Long-only rank-and-rebalance across 100 ETFs spanning 9 asset classes. Monthly rebalancing at month-end, with top-N selection by predicted 21-day forward return. Walk-forward evaluation uses 8 folds with 10-year training and 1-year validation windows. Features include risk-adjusted momentum, cross-sectional rankings, volatility clustering, and regime detection via hidden Markov models.

Data Sources

Yahoo Finance (ETF prices) FRED (macro indicators)

ML Techniques

Linear models (Ridge, LASSO, ElasticNet) LightGBM ranking TabM (tabular deep learning) LSTM and TSMixer PCA and latent factors Causal DML

ML Pipeline

Universe & Setup
1 notebook
100 ETFs across 9 asset classes (equities, bonds, commodities, currencies, real estate). Daily data with monthly month-end decisions. Point-in-time eligibility with $10M ADV threshold. 8 walk-forward folds (10Y train, 1Y val). Cost model is material but manageable (5-15 bps per leg) thanks to deep ETF liquidity.
Universe & Protocol Setup Ch 6
Defines the ETF universe across multiple asset classes with point-in-time eligibility rules (ADV threshold). Establishes monthly rebalancing cadence, long-only cost model, and walk-forward evaluation splits. Writes setup.yaml and eligibility.csv consumed by all downstream notebooks.
Labels & Evaluation
2 notebooks
21-day forward return as primary label (matching monthly rebalance cadence), 5-day variant for horizon sensitivity. Walk-forward splits with 21-day purge buffer. Cross-sectional quintile labels for classification experiments. Evaluation covers 78 features (65 financial + 13 temporal) with HAC-adjusted IC and Benjamini-Hochberg FDR correction.
Label Engineering Ch 7
Computes 21-day (primary) and 5-day (variant) forward returns with walk- forward splits. Constructs cross-sectional quintile labels for relative ranking prediction, filtered to eligible ETFs at each point in time. Evaluates label quality including class balance and IC of raw momentum baseline.
Feature Evaluation & Triage Ch 7
Evaluates all financial features (Ch8) and temporal features (Ch9) against forward return labels using HAC-adjusted IC with Benjamini-Hochberg FDR correction. Assesses feature redundancy and family-level signal concentration. Produces triage ledger (PROCEED / REVISE / STOP) for downstream Ch11 modeling.
Feature Engineering
2 notebooks
Risk-adjusted momentum features with skip-recent (12-1) across multiple horizons, cross-sectional asset-class ranking, volatility clustering, and intermarket lead-lag. Temporal features from 2-state Gaussian HMM on SPY for market regime detection, fractional differencing for stationarity-preserving transforms, and walk-forward ARIMA for mean-reversion forecasts. 78 features total.
Feature Engineering Ch 8
Builds risk-adjusted momentum features including skip-recent (12-1) across multiple horizons, technical indicators (RSI, MACD, Stochastic, ADX, Aroon, CCI), regime indicators (yield curve slope, SPY-TLT correlation, Hurst exponent), and volatility features (NATR). Enforces point-in-time eligibility on the feature matrix and diagnoses feature quality via IC analysis.
Temporal Features Ch 9
Fits three temporal model families: 2-state Gaussian HMM on SPY returns for regime detection with k-means initialization, fractional differencing on reference ETFs for memory-preserving stationarity, and per-ETF GARCH(1,1) for conditional volatility forecasts. Combines date-level and per-asset features into a full panel.
Modeling
8 notebooks
All 6 model families trained -- the broadest comparison in the book. Linear baselines (Ridge, LASSO, ElasticNet), GBM (LightGBM with Optuna), TabM (rank-1 adapter MLP ensemble), LSTM and TSMixer for sequential patterns, PCA and latent factor models for cross-asset structure, and causal DML testing whether momentum causes returns or reflects confounders. IC champion: TabM (+0.041). Sharpe champion: CAE (+0.741).
Linear Models Ch 11
Trains Ridge, LASSO, and ElasticNet on walk-forward folds with all features across eligible ETFs. Tests whether supervised linear combination improves on single-feature IC given high feature redundancy. Generates backtesting-ready predictions for Ch16.
Gradient Boosting Ch 12
Searches LightGBM configurations across leaf-count profiles and objectives with IC evaluated at iteration checkpoints on walk-forward folds. Tests whether nonlinear HMM-momentum interactions improve on the linear baseline across the ETF cross-section. Registers results for downstream backtest.
Tabular Deep Learning (TabM) Ch 12
Trains TabM rank-1 adapter MLP ensemble (small/medium/large) on the same flat feature matrix as Ridge and GBM via walk-forward CV with IC checkpoints. Tests whether attention-style adapters capture cross-asset interactions (equity momentum predicting bond rotation) that linear models and tree splits miss. Registers predictions for backtest.
LSTM Ch 13
Trains LSTM with 60-day lookback on walk-forward folds, processing each ETF's feature history independently. Serves as a controlled temporal baseline: any improvement over Ridge must come from temporal dynamics within individual feature histories, not from cross-asset interactions. Loads prior linear and GBM baselines for comparison.
TSMixer Ch 13
Trains TSMixer with alternating time/feature mixing layers on 60-day lookback windows across walk-forward folds. Tests whether cross-asset feature mixing (TLT leading equity vol, GLD leading inflation expectations) explains the IC gap over LSTM. Compares MLP-based channel mixing vs recurrent temporal modeling.
Latent Factor Models (CAE) Ch 14
Runs conditional autoencoder and related latent factor models via walk-forward CV on a balanced panel of ETFs (filtered by date-coverage threshold). Applies rank normalization and extracts latent factors from the multi-asset-class cross-section. GPU recommended for training.
PCA Factor Extraction Ch 14
Applies PCA to rank-normalized cross-sectional characteristics of ETFs via walk-forward CV. Extracts and interprets cross-asset latent factors, visualizes loading structure across asset classes, and compares predictive IC at primary and variant horizons against linear and GBM baselines.
Causal DML Estimation Ch 15
Applies Double Machine Learning to the skip-recent momentum signal (skip_recent_6_1) across the ETF universe. Confounders include volatility at 21d and 126d horizons, HMM regime state, and yield curve slope. Runs placebo permutation tests and registers causal effect estimates.
Strategy Pipeline
5 notebooks
Score-weighted, risk-parity, and mean-variance allocation compared. Monthly rebalancing keeps turnover low. The strategy remains profitable at 50 bps per leg -- one of the most cost-robust results. Drawdown controls and regime-conditional position sizing tested. Holdout Sharpe decays 55% from +0.74 to +0.33 despite IC improving 5x.
Cross-Model Analysis Ch 11
Compares best-in-family IC across all model families trained on the ETF case study with expanding-window folds. Evaluates checkpoint sensitivity, fold stability, prediction bucket monotonicity, and the IC-Sharpe disconnect (TabM leads IC while CAE leads Sharpe). Produces backtest recommendations.
Signal-Stage Backtest Ch 16
Runs plumbing test (random signal verification), then sweeps all (prediction x entry scheme) combinations across ETFs at monthly cadence. Computes Deflated Sharpe Ratio and tests whether the IC-Sharpe gap persists across configurations.
Portfolio Allocator Sweep Ch 17
Sweeps top signal-stage predictions x TOP_K concentration levels x 6 allocators (equal-weight, score-weighted, inverse-vol, risk-parity, MVO, HRP) across ETFs. Tests whether allocator choice extracts further value from the CAE and TabM signals beyond what the signal stage achieves for a monthly strategy.
Transaction Cost Analysis Ch 18
Sweeps a cost grid on top allocation-stage combinations for monthly ETF rotation. Measures the Sharpe decay curve from gross to net, identifies the breakeven cost level, and tests viability at standard institutional ETF execution costs. Monthly cadence makes this the most cost-favorable case study.
Risk Controls Ch 19
Applies position-level (stop-loss, trailing stop, time exit) and portfolio- level (drawdown breaker, daily loss limit) risk controls on top allocation combos. Calibrates trailing stops via MAE/MFE analysis. Tests whether risk overlays improve the drawdown profile without eroding the Sharpe that cost analysis confirmed for a monthly holding period.
Synthesis & Verdict
1 notebook
The IC-Sharpe disconnect poster child. Better predictions produce worse portfolio returns -- portfolio construction method matters more than prediction quality. Verdict: Advance. Priority is ensemble work (GBM + CAE) rather than further model search.
Strategy Synthesis & Verdict Ch 20
Synthesizes signal, allocation, cost, and risk results into a structured deployment verdict. Traces the champion through all pipeline stages, computes search risk, and evaluates holdout performance. Runs factor attribution against benchmark ETFs and produces the per-case-study verdict consumed by Ch20. Verdict: Advance.