US Firm Characteristics
Classic factor investing with ML on monthly fundamental data
This case study applies ML to the canonical factor investing question: can machine learning improve on traditional long-short decile sorts when accounting lags, survivorship bias, and transaction costs are taken seriously? Working with 57 firm-level characteristics spanning valuation, profitability, momentum, and risk across approximately 2,500 US stocks (1996-2016), this is the most feature-rich fundamental dataset in the book.
Students learn point-in-time data management with accounting publication lags, the impact of label engineering choices (classification vs regression), and the effect of preprocessing decisions like winsorization on model performance. The case study provides the strongest statistical foundation with 10 CV folds, enabling students to assess the reliability of walk-forward results.
The fundamental universe is the natural home for latent factor models (IPCA, CAE, SAE, SDF). Students learn how these architectures extract time-varying factor loadings from cross-sectional data, and how capacity constraints in small-cap stocks limit the practical scalability of strategies that appear strong on paper.
Strategy Summary
Long-short decile sort on approximately 2,500 US stocks using ML predictions on winsorized forward returns. Monthly rebalancing with 6-month characteristic lag enforced for point-in-time compliance. Equal-weight within deciles, dollar-neutral. 10 CV folds with 10-year training and 1-year validation windows provide the most robust statistical evaluation in the book.