Chapter 16: Strategy Simulation

White's Reality Check and Bootstrap Inference for Strategy Families

White's Reality Check asks a family-level question: after searching across many variants, is there evidence that any strategy truly beats the benchmark?

White's Reality Check and Bootstrap Inference for Strategy Families

White's Reality Check asks a family-level question: after searching across many variants, is there evidence that any strategy truly beats the benchmark?

The Intuition

Backtest overfitting is not only about one noisy Sharpe ratio. It is about search.

If you test enough variants, one will often look excellent in sample even when none has real edge. White's Reality Check turns that into the correct inferential object:

not "is the winner impressive?" but "does this searched family contain genuine benchmark-beating performance?"

That shift matters because the selected winner is already contaminated by search.

The Object Being Tested

Suppose you test M strategy variants. For strategy m, let

$$ d_{m,t} = r_{m,t} - r_{0,t} $$

be its return differential relative to a benchmark or null strategy.

The family-level null is:

$$ H_0:\ \max_m \mathbb{E}[d_{m,t}] \le 0. $$

This is the right null after search. It says no candidate strategy has positive expected incremental performance over the benchmark. In words: even the best strategy in the searched family has non-positive expected differential.

The test statistic is based on the best observed member, often through a maximum over average differentials or a standardized version of that maximum.

Why the Winner Cannot Be Tested Naively

Suppose you tried twenty momentum lookbacks and one came out best.

A naive t-test on the winner ignores two facts:

  • you looked at nineteen alternatives that lost
  • the winner was chosen because it looked best in this sample

That selection bias is the whole problem. White's logic keeps the family together.

Why Bootstrap Enters the Picture

The candidate strategies are dependent:

  • they use the same market history
  • they often share signals and positions
  • their return differentials are correlated through time

So the reference distribution for the maximum statistic is not simple. Bootstrap methods are used to approximate it under the family null while preserving dependence structure.

In time-series settings, that usually means some form of block bootstrap rather than IID resampling. The blocks preserve serial dependence that would be destroyed by IID reshuffling, and the chosen block length affects how much temporal structure survives.

The Workflow

A good mental model is:

  1. define the candidate strategy family
  2. compute each strategy's benchmark-relative differential series $d_{m,t}$
  3. form the observed family statistic from the best-performing member
  4. recenter or otherwise null-adjust the differential series, then bootstrap the joint process under the null
  5. compare the observed maximum to the bootstrap reference distribution

If the observed statistic is extreme relative to that null distribution, you reject the family null.

A Worked Example

Suppose you test twelve momentum rules with different formation and holding periods.

Naive read

The best rule has in-sample Sharpe 1.4, so you conclude the strategy works.

Reality-check read

You compute each rule's return differential versus an equal-weight benchmark and build the maximum family statistic. Then you recenter the joint differential series under the null and bootstrap it using time-respecting blocks.

Now two outcomes are possible:

  • the winner still looks unusually strong relative to the null family distribution
  • the winner sits well inside what search on noisy data can generate by luck

That second outcome is precisely what the test is designed to reveal.

What Rejection Does and Does Not Mean

If you reject the family null, you have evidence that the searched family contains some genuine benchmark-beating structure.

What you do not get is a license to say:

this specific selected strategy is now fully validated for live trading.

Reality Check is family-level inference, not deployment approval.

It corrects one layer of overstatement. It does not replace:

  • out-of-sample validation
  • protocol realism
  • cost modeling
  • structural robustness checks

White's original Reality Check is also known to be conservative. Later refinements such as Hansen's Superior Predictive Ability test were designed to improve power by reducing the influence of clearly bad strategies on the null distribution.

Benchmark Choice Matters

The differential series is defined relative to a benchmark, so the null changes with the benchmark.

  • against cash, you ask whether any strategy adds positive raw excess return
  • against a passive allocator, you ask whether any strategy adds active value
  • against a production baseline, you ask whether the searched family improves the current process

This is why benchmark choice is part of the inference, not bookkeeping.

In Practice

Use these rules:

  • define the strategy family before seeing the result you want to defend
  • bootstrap the full family jointly, not just the winner
  • use a time-series bootstrap that respects dependence
  • keep the benchmark explicit because the null is benchmark-relative
  • interpret rejection as family-level evidence, not as proof that the selected rule is ready to trade

Common Mistakes

  • Testing the selected winner as if it had been chosen in advance.
  • Using IID bootstrap on clearly dependent return series.
  • Forgetting that the strategies in the family are correlated.
  • Treating a family-level rejection as proof of one specific live rule.
  • Ignoring costs and implementation realism after passing the statistical test.

Connections

This primer supports Chapter 16's search-aware backtest inference. It connects directly to Deflated Sharpe logic, bootstrap methods for dependent data, protocol semantics, and the broader problem of turning one noisy historical path into defensible evidence.

Register to Read

Sign up for a free account to access all 61 primer articles.

Create Free Account

Already have an account? Sign in