Chapter 11: The ML Pipeline

Selection Bias in Model Tuning: Why Your Best Validation Score Lies intermediate

Even with perfect chronological splits and no data leakage, repeated hyperparameter search overfits the validation set — and the winning score systematically overstates the performance you should expect out of sample.

Register to Read

Create Free Account

Already have an account? Sign in

References

Advances in Financial Machine Learning

Marcos Lopez de Prado (2018) — John Wiley & Sons

On Over-ﬁtting in Model Selection and Subsequent Selection Bias in Performance Evaluation

Gavin C Cawley, Nicola L C Talbot (2010)

The Probability of Backtest Overfitting

David H. Bailey, Jonathan Borwein, Marcos Lopez de Prado, Qiji Jim Zhu (2015)

Chapter

11 The ML Pipeline