Chapter 7: Defining the Learning Task

Label Overlap: Why Your Sample Is Smaller Than You Think intermediate

When labels share future price paths, nominal sample counts exaggerate the evidence available for inference — often by an order of magnitude. Diagnosing overlap before interpreting results is not optional; it is the difference between a credible signal evaluation and a statistical illusion.

When labels share future price paths, nominal sample counts exaggerate the evidence available for inference — often by an order of magnitude. Diagnosing overlap before interpreting results is not optional; it is the difference between a credible signal evaluation and a statistical illusion.

Register to Read

Sign up for a free account to access all 112 primer topics.

Create Free Account

Already have an account? Sign in

References

Advances in Financial Machine Learning
Marcos Lopez de Prado (2018) — John Wiley & Sons
The Overlapping Data Problem
Ardian Harri, B. Wade Brorsen (1998) — SSRN Electronic Journal
...and the Cross-Section of Expected Returns
Campbell R. Harvey, Yan Liu, Heqing Zhu (2016) — Review of Financial Studies