Learning Objectives
- Place a strategy idea on the strategy map by linking it to a strategy family, a plausible source of edge, and the dominant feasibility constraints and failure modes.
- Define a versioned trading setup in decision-time terms: what is tradable, when decisions are made, what information is admissible, how scores become positions, and which constraints and costs are treated as material.
- Define "better" economically and keep model diagnostics, signal diagnostics, and strategy outcomes in distinct roles during research and evaluation.
- Design a time-series evaluation protocol that preserves chronology, prevents overlap leakage, and separates model selection from final performance estimation.
- Establish a narrow baseline checkpoint with timing, coverage, and trading-intensity sanity checks before expanding the search space.
- Keep search auditable, reproducible, and countable using a simple trial taxonomy and automatic run logging.
From Idea to Evidence with the ML4T Workflow
The live trading loop is defined as a five-step cycle (observe, score, map to positions, execute, monitor) alongside its research counterpart, which builds evidence about live behavior without shifting assumptions. Nine case studies spanning ETFs, equities, crypto, futures, FX, and options serve as scaffolds for the remainder of the book. The reader takes away a concrete mental model of how research iteration mirrors deployment and why timing discipline is the single most important guardrail.
Mapping Strategies and Sources of Edge
A two-lens framework for evaluating strategy ideas before building models: strategy families (price-based, fundamental, microstructure, market mechanics) classify ideas by their dominant constraints, while sources of edge (risk compensation, liquidity provision, flow predictability, informational advantage) explain why returns might persist. The section documents why most published anomalies fail in practice — post-publication decay, implementation gaps, and definitional sensitivity — and teaches readers to answer four questions about any idea before committing to experimentation.
Defining the Rules of the Trading Game
The trading setup is specified as the fixed evaluation environment: universe rules, decision schedule, score-to-trade mapping, constraints, and cost model class. The section distinguishes mechanics changes (which require a new setup version) from parameter tuning (which stays within a version), using a detailed ETF momentum example. Comparability across experiments depends on keeping the trading setup invariant and versioned.
1 notebook
Setting Objectives and Evaluation Metrics
A three-layer metric framework separates model diagnostics (can the model learn the label?), signal diagnostics (does the output behave like a tradable signal?), and strategy outcomes (does the process produce economic value under costs?). Using strategy-level outcomes to drive every micro-decision during development invites overfitting to simulator details. The reader learns to keep metric roles separate and reserve strategy-level evaluation for late-stage confirmation.
Evaluation Protocol for Time Series
Five forms of data leakage (label, standardization, threshold, survivorship, point-in-time) explain why standard k-fold cross-validation fails on financial data. The section covers walk-forward CV with expanding and rolling windows, temporal buffers to prevent overlap leakage, sealed holdout test sets, nested walk-forward for rolling retuning, and combinatorial methods like CPCV. Evaluation design commitments — window lengths, step size, buffer sizes, test periods — are non-negotiable protocol choices rather than tuning parameters.
1 notebook
Establishing a Baseline Checkpoint
The baseline checkpoint is the smallest runnable specification that answers whether the trading setup supports enough stable structure to justify deeper work. Three preflight checks (timing sanity, coverage sanity, trading-intensity sanity) and a narrow first reference run earn the right to expand models and features in later chapters. A failed baseline usually calls for revising the setup rather than optimizing around a brittle definition.
Search Accounting and Run Logging
Every research iteration must be logged with provenance, configuration, artifact pointers, and decision gates to support comparability and reproducibility. A four-level trial taxonomy (strategy, trial family, trial, run) connects run logging to the Deflated Sharpe Ratio and pre-registration as defenses against selection bias. Without countable search, iteration becomes untraceable and selection bias becomes invisible.
Related Case Studies
See where these chapter concepts get applied in end-to-end trading workflows.
ETF Cross-Asset Exposures
All six model families compared across 100 ETFs spanning 9 asset classes
Crypto Perpetuals Funding
Alternative data and non-standard frequencies in 24/7 crypto markets
NASDAQ-100 Microstructure
Intraday microstructure signals across 114 stocks at 15-minute frequency
S&P 500 Equity + Option Analytics
Combining options-derived features with equity data for multi-source prediction
US Firm Characteristics
Classic factor investing with ML on monthly fundamental data
FX Spot Pairs
Momentum and carry factors in the world's most liquid market
CME Futures
Carry signals across 30 products — data quality as the critical variable
S&P 500 Options (Straddles)
Direct options trading and why equity-style cost models fail for options
US Equities Panel
Large-scale cross-sectional prediction across 3,200 stocks with 16 walk-forward folds