Learning Objectives
- Explain why trading research is path-limited and how adaptive search and multiple testing can inflate apparent backtest performance.
- Use classical simulation baselines, including bootstrap and stochastic volatility models, as interpretable benchmarks for synthetic data generation.
- Select a synthetic-data approach that matches the data structure and downstream objective, including learned generators for time series and tabular financial data.
- Diagnose generated data using stylized-fact, dependence, and task-based evaluation methods, including Train-Synthetic-Test-Real comparisons.
- Assess privacy and generator-specific risks, including leakage, bias amplification, overfitting to the generator, and limited scenario novelty.
The Quant's Dilemma
Quantitative strategy development must make inferences from a single realized history with limited crises, regime shifts, and correlation breakdowns. After testing 100 configurations with zero true Sharpe, the expected maximum in-sample Sharpe exceeds 2.5 (Bailey et al. 2015), making backtest overfitting near-certain. Synthetic data is positioned as simulation infrastructure that turns one realized history into a distribution of plausible histories — but the generator must reproduce tail outcomes and dependence structure, not just bulk distributional similarity.
Classical Simulation Baselines
Bootstrap methods and parametric stochastic models (GBM, jump-diffusion, Heston, GARCH) remain strong reference points for evaluating learned generators. Bootstrap variants range from IID (preserves marginals, destroys temporal dependence) through stationary bootstrap (preserves short-range dependence). Any deep generative model should at least outperform these baselines on the diagnostics that matter for the downstream task.
1 notebook
Generative Model Taxonomy
Four families of learned generators are introduced: variational autoencoders (stable but may oversmooth), GANs (sharp but unstable), diffusion models (stable with iterative denoising), and LLM-based tabular generators (serialize rows as text). The key distinction is that learned generators can represent complex dependence structures that are difficult to specify parametrically, but can also fail silently by smoothing tails or collapsing modes.
GANs for Financial Time Series
Four GAN variants address specific limitations: TimeGAN adds supervised temporal objectives (TSTR ratio 1.70), Tail-GAN augments with VaR/ES penalties (reducing VaR error from 102% to 11.5%), Sig-CWGAN uses path-signature kernels for temporal fidelity (TSTR ratio 0.97), and GT-GAN uses neural ODEs for irregularly sampled data. Honest results and shared challenges (mode collapse, training instability) are documented.
4 notebooks
Diffusion Models for Financial Time Series
Diffusion-TS uses a Transformer encoder-decoder with trend-plus-seasonal decomposition, achieving KS statistic 0.06 and TSTR ratio 1.00 on 20 ETFs. Classifier-guided conditional generation enables regime-specific scenario production with a 2.6x volatility ratio between generated low and high-volatility samples. The key tradeoff is computational cost versus training stability.
1 notebook
LLMs for Structured Financial Data
Autoregressive language models generate synthetic tabular data by serializing rows as text. Using the GReaT framework with distilgpt2, fine-tuned in ~10 minutes on GPU, the approach achieves TSTR AUC-ROC of 0.84 and KS statistics below 0.035. Specific failure modes include invalid records from autoregressive generation and numerical fidelity limitations from token-level optimization.
1 notebook
The Fidelity–Utility–Privacy Framework
A three-dimensional evaluation framework: fidelity (marginal + temporal structure), utility (TSTR benchmarks), and privacy (empirical leakage + differential privacy). Applied to DP-GAN, strong privacy (epsilon=1) degrades fidelity by 6x while epsilon in [5,10] offers a practical sweet spot. The minimum validation checklist requires one distributional metric, one task benchmark, and one leakage test.
1 notebook