Crypto Perpetuals Funding

Alternative data and non-standard frequencies in 24/7 crypto markets

Cryptocurrency 8-Hour Alternative Data
Methodology Highlight
Teaches honest evaluation with minimal data — small universe, few CV folds, non-standard frequency — and how regime changes between periods affect model reliability.

This case study explores a structural feature unique to crypto markets: the funding rate mechanism in perpetual futures contracts. Every 8 hours, longs and shorts exchange payments based on the gap between perpetual and spot prices. The question is whether extreme funding conditions create predictable mean-reversion patterns.
Working with 19 Binance perpetual contracts, this is the smallest cross-section in the book and operates at a non-standard 8-hour frequency. Students learn to handle alternative data sources, round-the-clock markets, and the particular challenges of building features from funding rate dynamics and basis premiums.
The case study is designed as an honest evaluation exercise with intentionally limited statistical power — only 2 CV folds and a small asset universe. It teaches how to assess whether a signal is genuine or an artifact of limited data, and how regime changes between training and evaluation periods affect model reliability.

Strategy Summary

Long-short funding-aligned strategy across 19 crypto perpetual contracts. Rebalances every 8 hours at funding timestamps (00:00, 08:00, 16:00 UTC). Cost model distinguishes maker/taker fee tiers for majors vs altcoins. Walk-forward evaluation uses 2 folds with 2-year training and 1-year validation windows. Features are built from funding rate dynamics, basis premiums, and cross-sectional relative value.

Data Sources

Binance API Glassnode (on-chain) CryptoQuant (exchange flows)

ML Techniques

Gradient boosting LSTM for temporal patterns Time-series feature engineering Alternative data integration

ML Pipeline

Universe & Setup
1 notebook
19 perpetual futures pairs from Binance, trading on 8-hour funding-aligned cadence (00:00, 08:00, 16:00 UTC). Unbalanced panel -- assets enter at listing date with no backfill. Two-tier cost model: majors (BTC, ETH, BNB, SOL, XRP) at 2 bps maker, alts at 4 bps taker. Only 2 CV folds (2Y train, 1Y val), providing minimal stability evidence. The structural signal is funding rate mean-reversion, not price prediction.
Universe & Protocol Setup Ch 6
Defines the perpetual universe (top perps by volume) with unbalanced panel rules -- assets enter at listing date with no backfill. Aligns decision timing with 8-hour funding timestamps (00:00, 08:00, 16:00 UTC). Documents the return decomposition (price + funding - fees) and maker/taker fee tiers. Builds walk-forward splits respecting the 24/7 crypto calendar.
Labels & Evaluation
2 notebooks
8-hour forward return as primary label (aligned to Binance funding timestamps), 24-hour variant for horizon sensitivity. Labels use future mark prices, not funding-inclusive returns, isolating the price prediction signal. 44 features (39 financial + 5 temporal) evaluated with HAC-adjusted IC. The funding rate itself is a feature, not a label -- the question is whether extremes predict subsequent price moves.
Label Engineering Ch 7
Computes 8-hour (primary) and 24-hour (variant) forward returns from perpetual futures close prices aligned to Binance funding timestamps. Enforces point-in-time universe membership for late-listed tokens. Generates walk-forward CV splits and evaluates label quality with IC analysis against raw funding z-score.
Feature Evaluation & Triage Ch 7
Evaluates all features (financial + temporal) using HAC-adjusted IC with Benjamini-Hochberg FDR correction across perpetuals at 8-hour frequency. Diagnoses feature shape (quantile monotonicity) and redundancy (pairwise correlation). Assigns PROCEED / REVISE / STOP triage decisions for downstream model selection.
Feature Engineering
2 notebooks
Funding-specific features unique to crypto: premium z-scores, funding rate cumulative sums, carry metrics, and cross-symbol dispersion. GJR-GARCH(1,1) per symbol captures asymmetric leverage (crypto downside moves generate outsized volatility spikes). HMM regime switching on aggregate funding for market-wide sentiment state. 44 features total -- the most compact feature set in the book.
Feature Engineering Ch 8
Engineers funding-specific features: premium z-scores, funding rate cumulative sums, carry metrics, and cross-symbol dispersion features for regime awareness. Uses hours-based lookback windows (8h to 720h) aligned to the 8H cadence. Includes momentum, volatility, and liquidity features adapted to 24/7 crypto markets, plus major/alt cost-tier indicators.
Temporal Features Ch 9
Fits GJR-GARCH(1,1) per symbol for conditional volatility with asymmetric leverage effect and Student-t innovations for fat tails. Fits a 2-state Gaussian HMM on cross-sectional mean of premium z-scores to detect normal vs liquidation cascade regimes. Uses filtered (not smoothed) probabilities to avoid look-ahead. Produces 5 temporal features.
Modeling
7 notebooks
The only case study where deep learning leads on IC. LSTM achieves +0.030 vs GBM +0.023, capturing 8-hour temporal dependencies in funding rate dynamics that tree models miss. Linear models produce negative IC -- the signal is genuinely nonlinear. Autoencoder learns latent crypto market structure. TCN tests dilated causal convolutions on the short 8-hour cadence. Causal DML tests whether premium z-score causes returns or reflects confounders.
Linear Models Ch 11
Trains Ridge, LASSO, and ElasticNet on walk-forward folds across perpetual contracts at 8-hourly frequency. Tests whether regularization can isolate the mean-reversion component from noise given a negative baseline IC. Registers predictions and per-fold coefficients to the model registry.
Gradient Boosting Ch 12
Searches LightGBM configurations across leaf-count profiles and objectives with IC evaluated at iteration checkpoints. Tests whether tree-based nonlinearity captures funding rate dynamics that linear models miss on the narrow perpetual universe. Registers best checkpoint per config for downstream backtest.
Tabular Deep Learning (TabM) Ch 12
Trains TabM rank-1 adapter MLP ensemble (small/medium/large) on the same flat feature matrix as GBM via walk-forward CV with IC checkpoints. Tests whether attention-based architectures find additional structure beyond tree splits on the perpetual universe. Registers predictions for backtest.
LSTM Ch 13
Trains LSTM with gated recurrence on 60-bar lookback windows of 8-hourly crypto data across perpetual contracts. Tests whether gated memory captures temporal decay patterns in funding rate shocks that tree ensembles cannot model. Includes MC Dropout uncertainty estimation. Loads prior linear and GBM baselines for comparison.
Trains a Temporal Convolutional Network with dilated causal convolutions on 8-hourly data across perpetuals. Tests whether hierarchical temporal receptive fields capture multi-timescale funding dynamics differently than LSTM's sequential gating. Compares against prior linear and GBM baselines.
Autoencoder (Latent Factors) Ch 14
Trains a vanilla autoencoder on hourly returns from major crypto assets with a 2D latent space. Uses reconstruction error as an anomaly signal. Visualizes latent space structure and examines the relationship between reconstruction error and volatility regimes.
Causal DML Estimation Ch 15
Applies Double Machine Learning to premium_zscore_14d (treatment) across perpetual contracts at 8-hour frequency. Confounders include price volatility (14d), the funding rate, and premium mean-reversion tendency. Runs placebo permutation tests and registers causal effect estimates.
Strategy Pipeline
5 notebooks
Long-short perpetual portfolio with equal-weight sizing. Maker/taker fee distinction creates asymmetric cost profiles for majors vs alts. Positive Sharpe coexists with negative CAGR (compounding kills under 8-hour rebalancing). Risk controls tested but insufficient to prevent regime-driven drawdowns.
Cross-Model Analysis Ch 11
Compares best-in-family IC across all model families trained on the crypto case study. With few rolling-window folds and a narrow universe, applies conservative evaluation emphasizing structural plausibility over headline numbers. Evaluates fold stability, prediction bucket monotonicity, and cross-family prediction correlation.
Signal-Stage Backtest Ch 16
Runs plumbing test (random signal verification), then sweeps all (prediction x entry scheme) combinations across perpetuals at 8-hour cadence. Computes Deflated Sharpe Ratio and visualizes the IC-vs-Sharpe scatter. Registers all signal-stage backtest results to the registry.
Portfolio Allocator Sweep Ch 17
Sweeps top signal-stage predictions x TOP_K concentration levels x 6 allocators across the long-short portfolio. Tests how concentration choices interact with the narrow universe where small TOP_K creates high concentration risk and large TOP_K dilutes the mean-reversion signal.
Transaction Cost Analysis Ch 18
Sweeps a cost grid on top allocation-stage combinations with maker/taker tier distinction. Measures the Sharpe decay curve for a strategy with high rebalancing frequency. Identifies the breakeven cost level and quantifies the extreme cost sensitivity of high-frequency rebalancing.
Risk Controls Ch 19
Applies position-level (stop-loss, trailing stop, time exit) and portfolio- level (drawdown breaker, daily loss limit) risk controls on top allocation combos. Calibrates trailing stops via MAE/MFE analysis. Tests whether risk overlays can limit losses from regime shifts in the funding rate signal.
Synthesis & Verdict
1 notebook
The book's clearest negative result. Validation Sharpe +0.80 collapses to -1.17 in holdout (-247% decay). The 2024-2025 crypto bull run reversed the cross-sectional funding patterns. Verdict: Reframe -- viable only with explicit regime detection.
Strategy Synthesis & Verdict Ch 20
Synthesizes signal, allocation, cost, and risk results into a structured deployment verdict. Traces the champion through all pipeline stages, then documents its collapse in holdout. Quantifies search risk across the signal backtest sweep and produces the forensic trail of how a plausible funding-rate hypothesis failed in a changed regime. Verdict: Reframe.