S&P 500 Options (Straddles)

Direct options trading and why equity-style cost models fail for options

Options Daily Price Data

ETF Cross-Asset Exposures Crypto Perpetuals Funding NASDAQ-100 Microstructure S&P 500 Equity + Option Analytics US Firm Characteristics FX Spot Pairs CME Futures US Equities Panel

Methodology Highlight

Teaches why equity-style cost frameworks are structurally wrong for options, and the critical difference between mid-price backtests and executable-price reality.

Unlike the equity+options case study that uses options data to predict stocks, this case study trades options directly. It sells ATM straddles on S&P 500 constituents and delta-hedges daily, testing whether the variance risk premium — the persistent gap between implied and realized volatility — can be harvested systematically.
The defining lesson is about cost modeling. Option bid-ask spreads are quoted in volatility points and scale with premium, not with notional value. This makes standard basis-point cost sweeps (which work for equities and futures) structurally misleading for options. Students learn to build executable-price backtests that incorporate actual bid-ask spreads, hedge costs, and margin economics.
The case study demonstrates that a statistically strong ML signal does not guarantee a viable strategy — the gap between mid-price and executable- price performance can be enormous in options markets. This teaches the critical distinction between prediction quality and economic viability.

Strategy Summary

Sell ATM straddles on S&P 500 constituents weekly, delta-hedge daily, exit after 10 days. 612 individual equity straddles with 51 IV-derived features. The ML signal is built on delta-hedged returns. Cost model includes option spread, hedge spread, commission, and margin opportunity cost — evaluated at mid-price and executable-price levels.

Data Sources

OptionMetrics (options data) CBOE (VIX, VVIX)

ML Techniques

IV-derived feature engineering Delta-hedged return labels Options-specific cost modeling Multi-label backtesting

ML Pipeline

Universe & Setup

2 notebooks

ATM straddles on S&P 500 constituents. The only case study that trades options directly. Weekly Friday entry, daily delta-hedge, 10-day hold. A unique `02_instruments` notebook resolves contract roll contamination: materialized straddles roll contracts almost daily (68% roll rate), so shift-based forward returns mix different contracts. Same-contract exit price lookups from raw option chains produce clean labels. Margin requirements: 15-20% initial.

Universe & Protocol Setup Ch 6

Defines contract selection rules for ATM straddles on S&P 500 constituents. Documents P&L accounting for delta-hedged short straddles, hedging protocol (daily delta rehedge), margin requirements, and the dominant cost structure. Builds walk-forward CV splits.

Contract Instruments Ch 6

Resolves a critical data quality issue: materialized straddles roll contracts frequently, so shift-based forward returns mix different contracts -- creating phantom returns from chaining contracts with different time values. Implements same-contract exit price lookups from the raw option chain to produce clean labels.

Labels & Evaluation

2 notebooks

Three label variants enable the defining Sharpe decomposition: unhedged 10-day (mid-to-mid), delta-hedged 10-day (primary), and executable (at actual bid-ask). Positive return = profitable short vol position before costs. 51 features (47 financial + 4 temporal) evaluated. The three-label structure reveals that mid-to-mid Sharpe +2.70 collapses to -1.05 at executable prices.

Label Engineering Ch 7

Constructs short straddle return labels using same-contract prices from 02_instruments. Implements three label variants: unhedged 10-day (mid-to-mid), delta-hedged 10-day (primary), and 5-day variants. Positive return means profitable short vol position before costs. These three labels enable the Sharpe decomposition that reveals the cost gap.

Feature Evaluation Ch 7

Evaluates all features (financial + temporal) against delta-hedged straddle return labels using HAC-adjusted IC. Applies Benjamini-Hochberg FDR correction. Compares IC against unhedged vs delta- hedged returns. Assesses feature redundancy and produces triage decisions for Ch11 modeling.

Feature Engineering

2 notebooks

47 features organized into three groups unique to options: instrument state (straddle mid, Greeks, DTE, relative spread), surface features (IV level, skew, term structure, VRP, dynamics), and quality features (convergence codes, staleness proxies). VRP (IV minus RV) with per-symbol normalization. GJR-GARCH temporal volatility and Bayesian stochastic volatility produce 4 temporal features.

Feature Engineering Ch 8

Generates features organized into three groups: instrument state (straddle mid, Greeks, DTE, relative spread), surface features (IV level, skew, term structure, VRP, dynamics), and quality features (convergence codes, staleness proxies). Computes VRP (IV minus RV) with per-symbol normalization -- features unique to options case studies.

Temporal Features Ch 9

Fits walk-forward volatility models: GJR-GARCH (all symbols) estimates parameters on training data then filters the full series with fixed parameters to produce daily conditional volatility -- no re-estimation, no look-ahead. Bayesian Stochastic Volatility provides a second temporal feature. Produces temporal features for downstream modeling.

Modeling

7 notebooks

The strongest signal environment in the book. LASSO achieves IC 0.215 (richest linear signal), GBM pushes to IC +0.068, PatchTST reaches IC 0.387 on delta-hedged returns. All model families positive in both CV folds. A dedicated IC diagnostic notebook investigates why IC is an order of magnitude above other case studies. Causal DML tests whether VRP causes higher delta-hedged returns.

Linear Models Ch 11

Trains Ridge, LASSO, and ElasticNet on IV-derived features across equity straddles to predict delta-hedged 10-day returns. LASSO achieves the richest single-family linear signal in the book. Establishes the regularized regression baseline for downstream GBM and deep learning comparisons.

GBM Grid Search Ch 12

Trains LightGBM across tree-depth profiles and loss functions on the options feature matrix. Searches for non-linear interactions among IV skew, term structure, VRP, and volatility regime features. GBM improves on the linear baseline on delta-hedged returns, confirming non-linear structure in the options signal.

Tabular DL (TabM) Ch 12

Trains TabM rank-1 adapter MLP ensemble (small/medium/large variants) on the same feature matrix as linear and GBM. Compares attention-based ensembling against tree-based models for capturing multi-way interactions between IV term structure, skew, VRP, and cross-sectional positioning.

IC Diagnostic Ch 11

Decomposes the high IC via four controlled experiments: feature ablation (IV vs realized-vol features), lag decay (genuine VRP vs same-day leakage), PCA dimensionality reduction, and cross-sectional breadth analysis. Separates the VRP prediction component from equity directional signal.

LSTM Ch 13

Trains LSTM on 60-day lookback windows of IV features across equity straddles to predict delta-hedged returns. Evaluates whether sequential processing of volatility clustering and mean-reversion dynamics improves on flat-feature models (TabM, GBM) that see only the current snapshot.

PatchTST Ch 13

Trains PatchTST to group 60 days of IV features into patches and apply self-attention, predicting delta-hedged straddle returns from the evolution of volatility surfaces rather than their current snapshot. Achieves the highest single-model IC across all case studies.

Causal DML Ch 15

Applies Double Machine Learning to estimate the causal effect of vrp_21d on delta-hedged straddle returns, conditioning on rv_21d, vrp_mom_5d, and spread_pctl as confounders. Finds the VRP signal is predictive but not causal -- confounders absorb the entire treatment effect.

Strategy Pipeline

5 notebooks

The central tension: strongest IC vs dominant costs. Median round-trip bid-ask spread of 1,091 bps (10.9% of premium per trade) overwhelms the cross-sectional signal. Standard basis-point cost sweep shows Sharpe barely changing from 0 to 1,500 bps -- not cost robustness but model irrelevance, because options costs scale with premium, not notional. Gamma exposure limits and stop-loss protocols tested.

Model Analysis Ch 11

Compares IC and fold stability across all model families (linear, GBM, TabM, deep learning, latent factors, causal DML) on delta-hedged straddle returns. Confirms every family produces positive IC in both CV folds. GBM leads but single-stock option bid-ask spreads prevent the signal from surviving execution.

Backtest & Signal Evaluation Ch 16

Runs plumbing test, parametric sweep across all prediction-signal combinations, and statistical analysis (DSR, family comparison) for the straddle strategy. Executes premium-only backtests at mid-price, beginning the three-layer Sharpe decomposition (mid-unhedged, mid-delta-hedged, executable) that reveals the cost gap.

Portfolio Construction Ch 17

Sweeps top predictions x TOP_K concentration x 6 allocators (equal-weight through MVO) for the straddle strategy at mid-price with margin constraints. Evaluates whether any allocator or concentration level improves on the equal-weight baseline given the options-specific return distribution and embedded leverage.

Transaction Costs Ch 18

Applies an extended cost grid to the top allocation combinations, revealing negative executable Sharpe at actual bid-ask prices. Demonstrates that equity-style bps-of-notional cost models are structurally wrong for options where costs scale with premium, not notional -- the standard sweep barely changes Sharpe across the full grid.

Risk Management Ch 19

Sweeps position-level (stop-loss, trailing stops, time exits) and portfolio- level (drawdown breakers, daily loss limits) risk controls on the top allocation combinations. Calibrates trailing stops via MAE/MFE distributions. Demonstrates that risk overlays cannot rescue a strategy where cost dominance makes the baseline executable Sharpe negative.

Synthesis & Verdict

1 notebook

The failure that teaches the most. Real ML signal (IC +0.068) completely neutralized by option bid-ask spreads. Only executable-label backtesting reveals the true burden. Verdict: Reframe -- the ML signal is a viable input for equity positioning, not a directly tradable options strategy. Demonstrates why equity-style cost frameworks are structurally wrong for options.

Strategy Analysis Ch 20

Synthesizes the three-label Sharpe decomposition (unhedged, delta-hedged, executable) with the dominant round-trip spread. Produces the "reframe" verdict: the ML signal is a viable input for equity positioning, not a directly tradable options strategy. Documents why equity-style cost frameworks are structurally wrong for options.

CME Futures

US Equities Panel

Quick Info

Asset Class Options

Frequency Daily

Data Type Price Data

Notebooks 19

Chapters 13

Libraries Used

ML4T Engineer ML4T Diagnostic ML4T Backtest

Chapters

6 Strategy Research Framework 7 Defining the Learning Task 8 Financial Feature Engineering 9 Model-Based Feature Extraction 11 The ML Pipeline 12 Advanced Models for Tabular Data 13 Deep Learning for Time Series 15 Causal Machine Learning 16 Strategy Simulation 17 Portfolio Construction 18 Transaction Costs 19 Risk Management 20 Strategy Synthesis