Chapter 16: Strategy Simulation

White's Reality Check and Bootstrap Inference for Strategy Families

White's Reality Check asks a family-level question: after searching across many variants, is there evidence that any strategy truly beats the benchmark?

White's Reality Check and Bootstrap Inference for Strategy Families

White's Reality Check asks a family-level question: after searching across many variants, is there evidence that any strategy truly beats the benchmark?

The Intuition

Backtest overfitting is not only about one noisy Sharpe ratio. It is about search.

If you test enough variants, one will often look excellent in sample even when none has real edge. White's Reality Check turns that into the correct inferential object:

not "is the winner impressive?" but "does this searched family contain genuine benchmark-beating performance?"

That shift matters because the selected winner is already contaminated by search.

The Object Being Tested

Suppose you test M strategy variants. For strategy m, let

$$ d_{m,t} = r_{m,t} - r_{0,t} $$

be its return differential relative to a benchmark or null strategy.

The family-level null is:

$$ H_0:\ \max_m \mathbb{E}[d_{m,t}] \le 0. $$

This is the right null after search. It says no candidate strategy has positive expected incremental performance over the benchmark. In words: even the best strategy in the searched family has non-positive expected differential.

The test statistic is based on the best observed member, often through a maximum over average differentials or a standardized version of that maximum.

Why the Winner Cannot Be Tested Naively

Suppose you tried twenty momentum lookbacks and one came out best.

A naive t-test on the winner ignores two facts:

you looked at nineteen alternatives that lost
the winner was chosen because it looked best in this sample

That selection bias is the whole problem. White's logic keeps the family together.

Why Bootstrap Enters the Picture

The candidate strategies are dependent:

they use the same market history
they often share signals and positions
their return differentials are correlated through time

So the reference distribution for the maximum statistic is not simple. Bootstrap methods are used to approximate it under the family null while preserving dependence structure.

In time-series settings, that usually means some form of block bootstrap rather than IID resampling. The blocks preserve serial dependence that would be destroyed by IID reshuffling, and the chosen block length affects how much temporal structure survives.

The Workflow

A good mental model is:

define the candidate strategy family
compute each strategy's benchmark-relative differential series $d_{m,t}$
form the observed family statistic from the best-performing member
recenter or otherwise null-adjust the differential series, then bootstrap the joint process under the null
compare the observed maximum to the bootstrap reference distribution

If the observed statistic is extreme relative to that null distribution, you reject the family null.

A Worked Example

Suppose you test twelve momentum rules with different formation and holding periods.

Naive read

The best rule has in-sample Sharpe 1.4, so you conclude the strategy works.

Reality-check read

You compute each rule's return differential versus an equal-weight benchmark and build the maximum family statistic. Then you recenter the joint differential series under the null and bootstrap it using time-respecting blocks.

Now two outcomes are possible:

the winner still looks unusually strong relative to the null family distribution
the winner sits well inside what search on noisy data can generate by luck

That second outcome is precisely what the test is designed to reveal.

What Rejection Does and Does Not Mean

If you reject the family null, you have evidence that the searched family contains some genuine benchmark-beating structure.

What you do not get is a license to say:

this specific selected strategy is now fully validated for live trading.

Reality Check is family-level inference, not deployment approval.

It corrects one layer of overstatement. It does not replace:

out-of-sample validation
protocol realism
cost modeling
structural robustness checks

White's original Reality Check is also known to be conservative. Later refinements such as Hansen's Superior Predictive Ability test were designed to improve power by reducing the influence of clearly bad strategies on the null distribution.

Benchmark Choice Matters

The differential series is defined relative to a benchmark, so the null changes with the benchmark.

against cash, you ask whether any strategy adds positive raw excess return
against a passive allocator, you ask whether any strategy adds active value
against a production baseline, you ask whether the searched family improves the current process

This is why benchmark choice is part of the inference, not bookkeeping.

In Practice

Use these rules:

define the strategy family before seeing the result you want to defend
bootstrap the full family jointly, not just the winner
use a time-series bootstrap that respects dependence
keep the benchmark explicit because the null is benchmark-relative
interpret rejection as family-level evidence, not as proof that the selected rule is ready to trade

Common Mistakes

Testing the selected winner as if it had been chosen in advance.
Using IID bootstrap on clearly dependent return series.
Forgetting that the strategies in the family are correlated.
Treating a family-level rejection as proof of one specific live rule.
Ignoring costs and implementation realism after passing the statistical test.

Connections

This primer supports Chapter 16's search-aware backtest inference. It connects directly to Deflated Sharpe logic, bootstrap methods for dependent data, protocol semantics, and the broader problem of turning one noisy historical path into defensible evidence.

Register to Read

Create Free Account

Already have an account? Sign in

Chapter

16 Strategy Simulation

More Primers

Sharpe Ratio Under Autocorrelation and Non-Normal Returns The Sharpe Ratio