Statistical Tests¶
ML4T Diagnostic implements rigorous statistical tests to prevent false discoveries and account for multiple testing bias.
Deflated Sharpe Ratio (DSR)¶
The DSR adjusts the Sharpe ratio for the number of backtests tried:
from ml4t.diagnostic.evaluation.stats import deflated_sharpe_ratio
result = deflated_sharpe_ratio(
returns=strategy_returns,
n_trials=100, # How many strategies tested
frequency='daily', # Return frequency
periods_per_year=252
)
print(f"Observed Sharpe: {result.sharpe_ratio:.2f}")
print(f"Deflated Sharpe: {result.deflated_sharpe:.2f}")
print(f"p-value: {result.p_value:.4f}")
When to Use DSR¶
- After trying multiple strategy variations
- When selecting among several candidate strategies
- To report statistically honest performance
DSR Formula (López de Prado et al. 2025)¶
where \(\gamma\) accounts for: - Number of trials - Expected maximum Sharpe under null hypothesis - Autocorrelation in returns
See Deflated Sharpe Ratio for details.
Rademacher Anti-Serum (RAS)¶
RAS detects backtest overfitting using complexity theory:
from ml4t.diagnostic.evaluation.stats import rademacher_complexity, ras_sharpe_adjustment
# returns_matrix shape: (n_periods, n_strategies)
complexity = rademacher_complexity(returns_matrix)
observed_sharpes = returns_matrix.mean(axis=0) / returns_matrix.std(axis=0)
result = ras_sharpe_adjustment(
observed_sharpe=observed_sharpes,
complexity=complexity,
n_samples=returns_matrix.shape[0],
n_strategies=returns_matrix.shape[1],
return_result=True,
)
print(f"Number significant after RAS: {result.n_significant}")
print(f"Complexity penalty: {result.complexity:.4f}")
Interpretation¶
| RAS Result | Interpretation |
|---|---|
| High RAS | Strategy is robust, not overfit |
| Low RAS | Strategy may be overfit |
| Negative RAS | Strategy is likely spurious |
Minimum Track Record Length (MinTRL)¶
Calculate how long a track record must be for statistical significance:
from ml4t.diagnostic.evaluation.stats import compute_min_trl
result = compute_min_trl(
sharpe_ratio=1.5,
target_pvalue=0.05,
frequency='daily'
)
print(f"Minimum observations: {result.min_observations}")
print(f"Minimum years: {result.min_years:.1f}")
MinTRL with Multiple Testing¶
For FWER-controlled significance across multiple strategies:
from ml4t.diagnostic.evaluation.stats import min_trl_fwer
result = min_trl_fwer(
sharpe_ratio=1.5,
num_trials=50,
alpha=0.05
)
False Discovery Rate (FDR)¶
Control the expected proportion of false positives:
from ml4t.diagnostic.evaluation.stats import benjamini_hochberg_fdr
pvalues = [0.01, 0.03, 0.05, 0.08, 0.12]
rejected = benjamini_hochberg_fdr(p_values=pvalues, alpha=0.05)
# Identify discoveries
discoveries = rejected
Methods¶
| Method | Description |
|---|---|
bh |
Benjamini-Hochberg (controls FDR) |
by |
Benjamini-Yekutieli (conservative) |
holm |
Holm-Bonferroni (controls FWER) |
HAC-Adjusted Statistics¶
Account for heteroskedasticity and autocorrelation:
from ml4t.diagnostic.evaluation.stats import hac_adjusted_ic
result = hac_adjusted_ic(
predictions=predictions,
returns=forward_returns,
return_details=True,
)
print(f"HAC t-stat: {result['t_stat']:.2f}")
print(f"HAC std error: {result['bootstrap_std']:.4f}")
Probability of Backtest Overfitting (PBO)¶
Estimate the probability that an optimal strategy is overfit:
from ml4t.diagnostic.evaluation.stats import compute_pbo
result = compute_pbo(
is_performance=is_returns_matrix,
oos_performance=oos_returns_matrix,
)
print(f"PBO: {result.pbo:.1%}") # e.g., "32.5%"
Interpretation¶
| PBO | Interpretation |
|---|---|
| < 10% | Low overfitting risk |
| 10-30% | Moderate risk |
| > 30% | High overfitting risk |
See It In The Book¶
These statistical tests appear repeatedly in the book:
- FDR, DSR, MinTRL, and PBO:
code/07_defining_learning_task/07_multiple_testing.py - HAC-adjusted IC in causal and robustness checks:
code/07_defining_learning_task/08_causal_sanity_checks.py - HAC-adjusted IC plus FDR in the case studies:
code/case_studies/*/05_evaluation.py - DSR on real backtest returns:
code/16_strategy_simulation/12_dsr_validation.py - Sharpe inference and RAS workflow:
code/16_strategy_simulation/11_sharpe_ratio_inference.py,code/16_strategy_simulation/13_ras_protocol.py
For the chapter-level map, see the Book Guide.
References¶
- López de Prado et al. (2025). "How to Use the Sharpe Ratio"
- Bailey & López de Prado (2014). "The Deflated Sharpe Ratio"
- Paleologo, G. (2024). Elements of Quantitative Investing