Methodology Highlight
Teaches horizon-sensitivity analysis and hypothesis revision — the same features behave differently across return horizons, requiring systematic evaluation rather than single-configuration testing.

This case study applies the ML4T workflow to 20 G10 currency pairs using daily data from OANDA. Foreign exchange presents a structurally challenging prediction problem: the cross-section is small (20 pairs dominated by a single USD factor), limiting diversification and effective breadth.
Students learn to build carry and momentum features for FX, and discover how prediction horizon matters — the same features can be informative at one horizon and uninformative at another. The case study teaches horizon-sensitivity analysis as a systematic diagnostic step.
The pipeline progresses from momentum to carry to combined signals, illustrating hypothesis revision as a natural part of the research process. Students also learn how to apply causal inference (DML) to assess confounding in momentum signals, and how to set pair-specific cost budgets in a market where execution costs vary significantly across pairs.

Strategy Summary

Long-short daily strategy on 20 G10 FX pairs ranked by momentum and carry signals. Daily NY 5PM close cadence with next-bar-open execution. Dollar- neutral constraints. The case study evaluates multiple return horizons (1-day, 5-day, 21-day) to understand horizon sensitivity. 8 CV folds with 5-year training and 1-year validation windows.

Data Sources

OANDA (spot rates) FRED (interest rate differentials)

ML Techniques

Carry factor construction Momentum signals across horizons Causal DML for confounding Horizon-sensitivity analysis

ML Pipeline

Universe & Setup
1 notebook
20 G10 currency pairs from OANDA. Daily NY 5PM close cadence. Carry decomposition (spot return + interest rate differential) tracked from setup. Dollar-neutral constraints with ~3 effective independent bets (USD dominates the cross-section). 8 CV folds (5Y train, 1Y val) with 2024-2025 holdout. Cost model: 1-3 bps for majors, 3-8 bps for crosses.
Universe & Protocol Setup Ch 6
Defines the G10 FX universe with NY 5PM close convention, carry decomposition (spot return vs interest rate differential), and dollar-neutral constraints. Performs horizon feasibility analysis comparing 4-hour and daily bar moves against typical spreads. Builds walk-forward evaluation splits with purge/embargo rules.
Labels & Evaluation
2 notebooks
Three forward return horizons (1-day, 5-day, 21-day) reveal a ninefold IC scaling from 1d to 21d -- the cross-sectional momentum effect operates at monthly, not daily frequency. Evaluation covers 61 features (51 financial + 10 temporal) with HAC-adjusted IC. The horizon discovery drives the subsequent modeling to focus on the 21-day label.
Label Engineering Ch 7
Aggregates 4-hour bars to daily using NY 5PM rollover convention and computes forward return labels at three horizons (1-day, 5-day, 21-day). Applies session-bounded forward returns to prevent weekend gap leakage. Builds CV configuration respecting the FX calendar with appropriate purge windows per horizon.
Feature Evaluation Ch 7
Evaluates all features (financial + temporal) against the 1-day forward return label using HAC-adjusted IC with fold-level stability analysis. Applies Benjamini-Hochberg FDR correction and screens for coverage and staleness. Triages features into PROCEED / REVISE / STOP categories for downstream modeling.
Feature Engineering
2 notebooks
51 financial features spanning mean-reversion (z-scores, distance from extremes, OU half-life), multi-horizon momentum, risk-adjusted returns, and carry metrics. 10 temporal features from local linear trend Kalman filter with MLE noise estimation, HMM regime detection, and spectral analysis. 61 features total. The carry decomposition from setup feeds directly into carry-specific features unique to FX.
Feature Engineering Ch 8
Builds features spanning mean-reversion (z-scores, distance from extremes, OU half-life), multi-horizon momentum, risk-adjusted momentum, volatility (Garman-Klass at multiple windows), carry proxies, and cross-sectional rank features across the pair universe. Adds USD factor and rate-differential proxies unique to the FX setting.
Temporal Features Ch 9
Fits walk-forward temporal models: local linear trend Kalman filter per pair with MLE noise estimation, HMM for USD volatility regime detection on aggregate currency movements, and ARIMA residual features. All models are fitted on training windows and applied forward without re-estimation.
Modeling
6 notebooks
GBM leads at IC +0.045 on the 21-day label. Deep learning produces 0% positive backtests -- the starkest DL-negative result across all nine case studies, likely because the 20-pair cross-section is too small for attention or convolution to add value. DML reveals 60% confounding bias -- naive momentum overstates the true effect by more than half.
Linear Models Ch 11
Trains Ridge, LASSO, and ElasticNet via walk-forward CV on the FX pair universe with all features. Generates predictions across all folds and evaluates cross-sectional IC per configuration. Compares regularization approaches and stores coefficients for feature importance analysis.
Gradient Boosting Ch 12
Trains LightGBM across regularization profiles and loss functions with IC evaluated at iteration checkpoints. Tests whether shallow trees with heavy regularization outperform deeper configurations given the narrow FX cross-section. Registers all predictions for downstream backtesting.
Tabular Deep Learning (TabM) Ch 12
Trains TabM rank-1 adapter MLP ensembles (small/medium/large configurations) via walk-forward CV on the FX feature matrix. Compares attention-based cross-pair interaction learning against GBM and linear baselines. Registers predictions for downstream backtesting.
Trains dilated causal TCN on 60-day lookback windows of the FX feature panel. Tests whether temporal evolution of features captures signal that flat models miss. Compares IC learning curves against linear, GBM, and TabM baselines.
NLinear Ch 13
Trains NLinear (last-value subtraction plus single linear layer) as the minimal temporal DL baseline for FX. Tests the Zeng et al. (2023) hypothesis that simple linear architectures match complex ones on time series forecasting. Compares against TCN, TabM, GBM, and linear results.
Causal DML Ch 15
Applies DML to skip-recent momentum (mom_skip_recent) as treatment with three confounders -- Garman-Klass volatility at 21-day and 63-day horizons plus 21-day z-score -- across G10 FX pairs. Finds substantial amplification from naive to DML effect with significant confounding bias. Runs block-permutation placebo refutation confirming a genuine causal reversal consistent with the overshooting hypothesis.
Strategy Pipeline
5 notebooks
Only holdout that improved: validation Sharpe +0.19 to holdout +0.28 (+45%), likely reflecting a favorable carry regime. Strategy turns negative above 15 bps per leg -- the tightest cost tolerance with the narrowest viable operating window. Pair-specific cost budgets needed rather than blanket assumptions. Position-level exit rules tested.
Model Analysis Ch 11
Compares all model families (linear, GBM, TabM, DL, causal) on the FX prediction task using registry metrics, fold stability diagnostics, feature importance, prediction correlation, and regime-conditional IC analysis. Produces per-family advancement recommendations for Ch16 backtesting.
Backtest & Signal Evaluation Ch 16
Runs plumbing test (random signal verification), then sweeps all model predictions across signal methods and TOP_K configurations using the ml4t-backtest engine with daily FX prices. Computes DSR, family-level comparison, and IC-to-Sharpe translation statistics.
Portfolio: Allocator Sweep Ch 17
Sweeps top signal-stage predictions across TOP_K concentration levels and 6 allocators (equal-weight, score-weighted, inverse-vol, risk-parity, MVO, HRP) on the FX pair universe. Tests how concentration interacts with allocation method in a small cross-section with few independent risk factors.
Transaction Costs Ch 18
Runs a cost grid sweep on top allocation-stage combinations to find the breakeven cost level. Plots net Sharpe decay curves and identifies the viable cost window for FX majors. Compares breakeven against realistic interbank execution costs.
Risk Management Ch 19
Sweeps position-level (stop-loss, trailing stop, time exit) and portfolio-level (drawdown breaker, daily loss limit) risk controls on top allocation combos. Calibrates trailing stops using MAE/MFE distributions. Measures how each overlay modifies drawdown and Sharpe without adding excessive turnover.
Synthesis & Verdict
1 notebook
Genuine signal in the most efficient market on earth, but fragile. Verdict: Advance -- build pair-specific cost models from actual interbank spreads. The 21-day horizon is the only viable operating point. A story of hypothesis revision from momentum to carry to the conclusion that cost precision matters more than model complexity.
Strategy Analysis Ch 20
Assembles the full FX pipeline verdict by tracing the champion through signal, allocation, cost, and risk stages via BacktestExplorer. Computes holdout performance, search risk accounting, and operating profile. Produces a structured iterate/advance verdict with factor attribution for Ch20 synthesis.