Learning Objectives
- Explain why the information coefficient is a useful entry metric for financial signals but does not translate
- Distinguish signal quality, portfolio translation, cost survival, and temporal stability as separate stages in
- Compare how major model families perform after the full pipeline, and identify when robustness matters more than peak
- Diagnose holdout disappointment using distinct failure modes, including prediction decay, translation decay, and
- Evaluate trading strategies under realistic implementation constraints, including instrument-appropriate cost models,
- Identify the highest-return next steps after a first research pass, including label redesign, ensembling, feature
- Apply a practitioner workflow that moves from data and diagnostics through signal generation, strategy construction,
The Nine Case Studies: From Signal to Verdict
Each of the nine case studies receives a verdict -- advance, iterate, or reframe -- based on where the pipeline first imposes a binding constraint. US firm characteristics produced the strongest integrated result (validation Sharpe +3.03, holdout +2.52) but is capacity-constrained in small caps. FX is the only study where holdout exceeds validation. CME futures is the data quality teaching case where a single back-adjustment choice cascades through three failures. S&P 500 options has a real ML signal destroyed by 1,091 basis point median round-trip spreads. Crypto shows complete holdout failure as a bull run reversed learned funding patterns. The verdicts demonstrate that raw signal quality never settles the case on its own.
2 notebooks
When IC Lies: Signal Quality Beyond the Information Coefficient
IC is dangerously incomplete as a verdict metric: NASDAQ-100 has the weakest IC in the book (0.008) but the highest Sharpe (4.22), while ETFs sees IC improve fivefold in holdout as Sharpe decays 55%. This section introduces a richer diagnostic bundle -- ICIR for consistency across folds, positive-fold share for regime dependence, and checkpoint sensitivity for deep learning selection risk. It also argues that label engineering (classification vs. regression, horizon choice, winsorization) may have higher return on investment than model architecture research, citing US firm characteristics where winsorizing extreme returns lifts GBM IC ninefold -- a larger effect than any model family difference.
1 notebook
How Predictions Become Profits: The IC-to-Sharpe Translation
The Fundamental Law of Active Management provides the framework for understanding why IC alone does not determine strategy performance: breadth -- the number of independent bets per period -- mediates the translation. NASDAQ-100 converts near-zero IC into the highest Sharpe through enormous intraday breadth, while FX is capped by a 20-pair universe regardless of model quality. The section shows that rebalancing cadence is the hidden multiplier (the same signal can be worthless or profitable at different frequencies), that moderate selectivity raises median Sharpe but aggressive concentration gives back gains, and that portfolio construction is the neglected middle where lower-IC models can outperform higher-IC models through better score-to-weight translation.
1 notebook
Robustness Beats Peak Signal: The Model Family Synthesis
GBM is the downstream champion in six of nine case studies even when it does not lead the IC table, because its lower variance, absence of checkpoint selection, and implementation robustness compound through the multi-stage pipeline from prediction to deployment. Deep learning finds its niche where signals live in temporal structure or nonlinear interactions -- CME futures shows the strongest nonlinearity diagnostic (negative linear IC, positive GBM IC) -- but an important coverage caveat applies: deep learning was not tested on the two strongest results. The practical default is to start with GBM everywhere and invest in deep learning only when positive evidence of structural nonlinearity exists.
Trading Realism: Costs, Capacity, and Execution
Five cost-survival tiers emerge from breakeven analysis, ranging from extremely robust (US firm characteristics survives above 100 bps) to fatal (S&P 500 options is negative at zero assumed friction). Cost fragility is largely predictable from two inputs: rebalancing cadence and universe liquidity. The section also demonstrates that blanket basis-point cost models are structurally wrong for options (where costs scale with premium), micro-cap equities (where spreads vary by orders of magnitude), and high-frequency strategies (where per-share models are needed). The strongest paper signals tend to concentrate where capacity is most constrained, creating a fundamental tension between signal strength and deployability.
1 notebook
Stability Across Time and Regimes
Holdout Sharpe decay ranges from +45% (FX improves) to -247% (crypto reverses completely), with a median near 50% across the nine studies. Three distinct failure modes are identified: prediction decay (the signal genuinely weakens, as in US equities), translation decay (IC improves but portfolio construction loses value, as in ETFs), and structural break (regime shift invalidates learned patterns, as in crypto). Risk overlays are shown to be conditional rather than universal -- they help when drawdowns correlate with observable risk signals (US firm characteristics: managed Sharpe +3.92) but hurt when the signal reversal is structural (crypto). The taxonomy directs practitioners to the right lever rather than applying a generic fix.
2 notebooks
Causal Credibility: What Can We Actually Claim?
Chapter 15's double machine learning analysis produces a mixed causal scorecard: two studies achieve robust causal evidence (ETFs at 30% confounding bias, FX at 60%), four are suggestive with substantial bias, and three are inconclusive. The section argues that confounding bias should be reported alongside Sharpe as a complementary fragility indicator -- high bias means the signal is heavily exposed to shifts in momentum, volatility, and market factors, making it more likely to break when those relationships change. Predictive signals without causal identification are usable with appropriate risk management, but high-bias strategies deserve tighter risk budgets and more frequent re-evaluation.
What We Deliberately Left on the Table
Every result in the book is a deliberately constrained baseline: single pipeline pass, standard features, single hyperparameter sweep, no ensembles, simplest allocators. This section inventories these constraints and assesses their likely impact, identifying iteration (5-10 hypothesis-driven cycles) as the highest-leverage improvement, followed by feature engineering (domain-specific and alternative data), ensemble methods (model, horizon, and checkpoint averaging), and label engineering. The ensemble opportunity is particularly concrete: checkpoint ensembling directly addresses the S&P 500 equity-plus-options problem where CAE IC swings 0.14 across epochs. The constraints are the feature, not the bug -- they make comparisons informative while leaving substantial room for improvement.
The Practitioner's Playbook
The synthesis is distilled into a four-phase development sequence: data and diagnostics (validate quality, build cost models, choose labels deliberately), signal generation (run all model families, use the four-metric diagnostic bundle, check nonlinearity), strategy construction (test cadences, compare allocators, compute cost sensitivity, apply risk overlays), and validation with iteration (frozen holdout, Deflated Sharpe Ratio, failure-mode decomposition, 5-10 hypothesis-driven cycles). Per-case-study recommendations specify the next concrete step for each of the nine studies. The closing argument is that the pipeline is transferable but the results are not -- the reader's competitive advantage is iteration with their own data, features, and domain knowledge.
Related Case Studies
See where these chapter concepts get applied in end-to-end trading workflows.
ETF Cross-Asset Exposures
All six model families compared across 100 ETFs spanning 9 asset classes
Crypto Perpetuals Funding
Alternative data and non-standard frequencies in 24/7 crypto markets
NASDAQ-100 Microstructure
Intraday microstructure signals across 114 stocks at 15-minute frequency
S&P 500 Equity + Option Analytics
Combining options-derived features with equity data for multi-source prediction
US Firm Characteristics
Classic factor investing with ML on monthly fundamental data
FX Spot Pairs
Momentum and carry factors in the world's most liquid market
CME Futures
Carry signals across 30 products — data quality as the critical variable
S&P 500 Options (Straddles)
Direct options trading and why equity-style cost models fail for options
US Equities Panel
Large-scale cross-sectional prediction across 3,200 stocks with 16 walk-forward folds