S&P 500 Equity + Option Analytics

Combining options-derived features with equity data for multi-source prediction

Options Daily Price Data
Methodology Highlight
Teaches multi-source feature integration — extracting signals from options markets for equity prediction — and causal inference to disentangle genuine signal from factor confounding.

This case study uses options-derived signals to predict equity returns — not to trade options directly. Implied volatility surfaces, skew measurements, and term structure features from the S&P 500 options market are combined with standard equity features to predict 5-day stock returns across 634 constituents with listed options.
Students learn multi-source feature integration: how to extract information from one market (options) and apply it to predictions in another (equities). The case study covers IV surface construction, skew measurement, and term structure decomposition, teaching how to handle publication lags for point-in-time compliance.
The case study also demonstrates the challenges of deep learning in finance, including training instability across epochs and the sensitivity of results to checkpoint selection. Students learn causal inference techniques (DML) to assess how much of an apparent signal reflects genuine predictive content versus confounding with known factors like momentum and volatility.

Strategy Summary

Long-only equal-weight strategy on S&P 500 stocks ranked by a composite signal from IV surface, skew, term structure, and equity momentum features. Weekly Friday-close decisions with Monday-open execution. IV publication lag of 1 day enforced for point-in-time compliance. 2 CV folds with 2-year training and 1-year validation windows.

Data Sources

OptionMetrics (IV surfaces) CBOE (VIX term structure)

ML Techniques

IV surface feature engineering Multi-source feature integration Deep learning (CAE, NLinear) Causal DML for confounding analysis

ML Pipeline

Universe & Setup
1 notebook
634 S&P 500 constituents with listed options. Weekly Friday-close decisions with Monday-open execution. IV publication lag of 1 day enforced for point-in-time compliance. The only multi-source case study: equity bars + option analytics. 2 CV folds (2Y train, 1Y val), 2021 holdout. Cost model is material (3-10 bps) given large-cap equity liquidity.
Universe & Protocol Setup Ch 6
Defines the trading universe (S&P 500 constituents with listed options), weekly Friday-close decision cadence, and IV publication lag rules. Validates options coverage thresholds against expected NYSE trading days and reconciles equity bars with options chains. Builds 2-fold walk-forward evaluation splits.
Labels & Evaluation
2 notebooks
5-day forward equity return as primary label (matching weekly cadence), 10-day variant. Equity returns are the target -- options provide features, not labels. Evaluation covers 47 features (44 IV/equity + 3 GARCH temporal) with special focus on whether IV features add incremental IC beyond realized volatility.
Label Engineering Ch 7
Computes 5-day (primary) and 10-day forward equity returns as prediction targets using execution-consistent conventions (enter at t+1 open, hold for h days). Generates risk-adjusted variants normalized by 20-day realized volatility and binary direction labels for classification approaches.
Feature Evaluation Ch 7
Evaluates IV/equity features and GARCH temporal features against 5-day forward equity returns using HAC-adjusted IC (Newey-West for overlapping returns) with Benjamini-Hochberg FDR correction. Screens for coverage and staleness. Triages features into PROCEED / REVISE / STOP categories.
Feature Engineering
2 notebooks
44 features across six families unique to this case study: IV level and dynamics (ATM IV, z-scores, percentiles, momentum), skew and term structure (risk reversal, butterfly, slope), VRP (IV minus RV with per-symbol normalization), and standard equity features. GJR-GARCH(1,1) per stock provides temporal volatility estimates. The multi-source integration question: do options tell you something equities don't?
Feature Engineering Ch 8
Builds features across six families by merging equity bars with options analytics: IV level and dynamics (ATM IV, z-scores, percentiles, momentum), skew and term structure (risk reversal, term slope, convexity), variance risk premium (IV-RV spread), cross-sectional ranks, equity momentum, and quality. Uses delta-based surface point selection with 1-day IV lag enforcement.
Temporal Features (GARCH) Ch 9
Fits GJR-GARCH(1,1) per stock via walk-forward CV to produce temporal features: forward-looking conditional volatility, improved VRP signal (garch_ivrv_spread = iv_30_atm - garch_cond_vol), and volatility surprise (|return| / garch_cond_vol).
Modeling
7 notebooks
CAE achieves IC +0.073 (highest in book) but swings from -0.070 to +0.073 across checkpoints -- the most fragile result. NLinear wins Sharpe (+1.10) despite IC of only +0.008 (second IC-Sharpe paradox). DML reveals 88% confounding bias -- IV features are deeply entangled with momentum. A dedicated prediction ingestion notebook manages the multi-model result flow into Ch16-19.
Linear Models Ch 11
Trains Ridge, LASSO, and ElasticNet via walk-forward CV on S&P 500 symbols with IV and equity features. Tests whether supervised linear combination of individually weak features (none survive FDR) produces usable signal. Registers predictions for downstream backtesting.
Gradient Boosting Ch 12
Trains LightGBM across regularization profiles and loss functions (MSE, MAE, Huber). Evaluates IC at iteration checkpoints to detect overfitting with limited CV folds. Registers predictions for downstream backtesting.
Tabular Deep Learning (TabM) Ch 12
Trains TabM rank-1 adapter MLP ensembles (small/medium/large) via walk-forward CV on the combined IV + equity feature matrix. Tests whether attention-based cross-family interaction learning (IV term structure x VRP x momentum) outperforms tree-based splitting on the 5-day label.
Prediction Ingestion Ch 16
Provides case-local prediction ingestion utilities that normalize column names, align timestamps, and merge predictions across model families for Ch16-19 strategy notebooks.
LSTM Ch 13
Trains LSTM with 60-day lookback on sequential IV and equity features via walk-forward CV. Tests whether gated memory captures temporal patterns in IV term structure shifts and VRP mean-reversion that point-in-time models miss. Compares against Linear, GBM, and TabM baselines.
PatchTST Ch 13
Trains PatchTST with multi-scale patch attention on daily feature windows. Tests whether patching captures hierarchical temporal patterns in IV dynamics (daily, weekly, monthly timescales) more effectively than LSTM's sequential gating. Compares against all prior model families.
Causal DML Ch 15
Applies DML to the IV-RV spread (ivrv_spread) as treatment across S&P 500 equities with confounders: 20-day realized volatility, 21-day equity momentum, and 25-delta risk-reversal skew. Runs placebo refutation tests and quantifies confounding bias between naive and DML causal effect estimates.
Strategy Pipeline
5 notebooks
First case study where ML genuinely beats equal-weight. Score-weighted allocation exploits the large 634-stock cross-section. Holdout Sharpe +0.46 despite negative prediction IC (-0.016) -- portfolio translation effects preserve value even when average correlation breaks. Cost model benefits from liquid S&P 500 equities.
Model Analysis Ch 11
Compares all model families (linear, GBM, TabM, DL, latent factors, causal) on the equity+option universe using registry metrics, fold stability diagnostics, prediction correlation, and decile monotonicity analysis. Produces per-family advancement recommendations for Ch16 backtesting.
Backtest & Signal Evaluation Ch 16
Runs plumbing test (random signal verification), then sweeps all predictions across signal methods and TOP_K configurations using the ml4t-backtest engine with weekly rebalancing. Computes DSR, family comparison, and IC-to-Sharpe translation statistics.
Portfolio: Allocator Sweep Ch 17
Sweeps top signal-stage predictions across TOP_K concentration levels and 6 allocators (equal-weight, score-weighted, inverse-vol, risk-parity, MVO, HRP) on the weekly S&P 500 universe. Tests how concentration interacts with tail-selection signals on the largest equity cross-section in the book.
Transaction Costs Ch 18
Runs a cost grid sweep on top allocation-stage combinations to find breakeven. Plots net Sharpe decay curves across cost levels for the weekly large-cap equity strategy. Tests sensitivity to spread, commission, and market impact assumptions.
Risk Management Ch 19
Sweeps position-level (stop-loss, trailing stop, time exit) and portfolio-level (drawdown breaker, daily loss limit) risk controls on top allocation combos. Tests whether risk overlays improve or degrade Sharpe for a tail-selection signal with weekly rebalancing on S&P 500 stocks.
Synthesis & Verdict
1 notebook
Highest IC but most fragile. Verdict: Advance -- contingent on checkpoint ensembling to stabilize CAE. The recommendation is conditional, not confirmed. Teaches that single-checkpoint deep learning results can be misleading.
Strategy Analysis Ch 20
Assembles the full S&P 500 equity+options pipeline verdict by tracing the NLinear champion through signal, allocation, cost, and risk stages via BacktestExplorer. Computes holdout performance, search risk accounting, and Fama-French factor attribution. Produces a structured deployment verdict for Ch20 synthesis.