Chapter 7

Defining the Learning Task

5 sections 10 notebooks 18 references Code

Learning Objectives

Build split-aware preprocessing pipelines that produce stable, auditable inputs for label and feature computation.
Define execution-consistent labels, including fixed-horizon and event-style constructions, and diagnose overlap, resolution behavior, and implied trading intensity.
Evaluate feature-label bundles fold by fold using appropriate diagnostics for continuous and discrete targets, including stability, shape, and feasibility.
Screen candidates for implementation feasibility using turnover, break-even cost, and liquidity or capacity checks.
Account for search bias by defining searched sets, separating exploration from confirmation, and applying appropriate multiple-testing adjustments to fold-level summaries.
Use mechanism plausibility checks to distinguish potentially stable signal channels from confounded proxies, timing artifacts, and aggregation effects.

7.1

Data Preprocessing and Encodings

4 notebooks

7.2

Label Engineering

2 notebooks

7.3

Univariate Feature-Label Evaluation

2 notebooks

7.4

Search Accounting and Multiple Testing

1 notebook

7.5

From Correlation to Causality

1 notebook

Related Case Studies

See where these chapter concepts get applied in end-to-end trading workflows.

All case studies

ETF Cross-Asset Exposures

All six model families compared across 100 ETFs spanning 9 asset classes

ETFs Daily

Crypto Perpetuals Funding

Alternative data and non-standard frequencies in 24/7 crypto markets

Cryptocurrency 8-Hour

NASDAQ-100 Microstructure

Intraday microstructure signals across 114 stocks at 15-minute frequency

Equities 15-Minute

S&P 500 Equity + Option Analytics

Combining options-derived features with equity data for multi-source prediction

Options Daily

US Firm Characteristics

Classic factor investing with ML on monthly fundamental data

Fundamentals Monthly

FX Spot Pairs

Momentum and carry factors in the world's most liquid market

Foreign Exchange Daily

CME Futures

Carry signals across 30 products — data quality as the critical variable

Futures Daily

S&P 500 Options (Straddles)

Direct options trading and why equity-style cost models fail for options