Chapter 7

Defining the Learning Task

6 sections 10 notebooks 18 references Code

Learning Objectives

  • Build split-aware preprocessing pipelines that produce stable, auditable inputs for label and feature computation.
  • Define execution-consistent labels, including fixed-horizon and event-style constructions, and diagnose overlap, resolution behavior, and implied trading intensity.
  • Evaluate feature-label bundles fold by fold using appropriate diagnostics for continuous and discrete targets, including stability, shape, and feasibility.
  • Screen candidates for implementation feasibility using turnover, break-even cost, and liquidity or capacity checks.
  • Account for search bias by defining searched sets, separating exploration from confirmation, and applying appropriate multiple-testing adjustments to fold-level summaries.
  • Use mechanism plausibility checks to distinguish potentially stable signal channels from confounded proxies, timing artifacts, and aggregation effects.
Figure 7.1
7.1

Data Preprocessing and Encodings

7.2

Label engineering

7.3

Univariate Feature–Label Evaluation

7.4

Search accounting and multiple testing

7.5

From correlation to causality

7.6

Summary