Chapter 11: The ML Pipeline

Selection Bias in Model Tuning: Why Your Best Validation Score Lies intermediate

Even with perfect chronological splits and no data leakage, repeated hyperparameter search overfits the validation set — and the winning score systematically overstates the performance you should expect out of sample.

Even with perfect chronological splits and no data leakage, repeated hyperparameter search overfits the validation set — and the winning score systematically overstates the performance you should expect out of sample.

Register to Read

Sign up for a free account to access all 112 primer topics.

Create Free Account

Already have an account? Sign in

References

Advances in Financial Machine Learning
Marcos Lopez de Prado (2018) — John Wiley & Sons
On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation
Gavin C Cawley, Nicola L C Talbot (2010)
The Probability of Backtest Overfitting
David H. Bailey, Jonathan Borwein, Marcos Lopez de Prado, Qiji Jim Zhu (2015)