Chapter 1
3rd Edition
Stage 1: Hypothesis Formulation
Both Case Studies

Process Is Your Edge

Getting started with machine learning in financial markets

20 min read

Core Thesis

In modern quantitative finance, sustainable advantage emerges not from a single secret model, but from an adaptive, industrial-grade process for creating, validating, and managing strategies. This "alpha factory" workflow is the only reliable way to navigate post-2020 market shocks and leverage new tools like Generative AI without succumbing to their risks.

Learning Outcomes
  • Identify the recent structural breaks and market shocks that render older, static trading models obsolete
  • Describe the seven distinct stages of the ML4T Workflow, from ideation to live monitoring
  • Explain how the workflow addresses real-world constraints imposed by team structure, regulations, and research economics
Newsletter Subscriber Content

Deep Dive & Practical Insights

Detailed explanations and code examples for newsletter subscribers

1.1 Markets In Flux

This section traces three distinct market shocks that systematically broke model assumptions: **2020 Pandemic**: Correlation breakdown as asset classes that typically moved together diverged dramatically. Traditional diversification failed. **2021-23 Inflation Surge**: Volatility regime shift invalidated decade-long assumptions about market stability. The 60/40 portfolio suffered historic losses. **2023-25 AI Concentration Rally**: Sector leadership inversion as mega-cap tech dominated, rendering factor-based strategies ineffective. Two new imperatives emerged: the push to **causality** (understanding economic drivers rather than chasing correlations) and the rise of **Generative AI** (which can augment every workflow stage but introduces risks like hallucination and data leakage).

1.2 Seven Stage Workflow

The 7-Stage ML4T Workflow is not a linear checklist but a **cyclical system** for industrializing quant research: **Research & Development**: (1) Strategy Definition → (2) Data Sourcing → (3) Feature Engineering → (4) Model Prototyping → (5) Portfolio Backtesting **Production Lifecycle**: (6) Incubation & Deployment → (7) Monitoring & Review Critical feedback loops connect stage 7 back to stages 1 and 4, enabling continuous iteration and model retraining. The workflow addresses key challenges: avoiding lookahead bias, handling missing data, preventing overfitting, managing transaction costs, and ensuring regulatory compliance.

1.3 Industry Context

A technically perfect workflow is useless if it cannot function within organizational realities: **Team Structures**: Research scientists, quantitative developers, and portfolio managers have different objectives. The workflow must coordinate handoffs and prevent silos. **Regulatory Realities**: MiFID II best execution, GDPR data privacy, and model risk management frameworks impose hard constraints on data usage and model documentation. **Research Economics**: Every experiment costs time and compute. The workflow must balance thoroughness with velocity, using stage gates to kill bad ideas early.

1.4 Case Studies

Two contrasting strategies illustrate why a flexible process is essential: **Cross-Asset ETF Rotational Momentum** (Low-Frequency): - Daily rebalancing across equity/bond/commodity ETFs - 12-month momentum signals, 20-day volatility scaling - Target: 12% annual return, 10% volatility, Sharpe 1.2 **Crypto Funding Rate Reversal** (High-Frequency): - Hourly rebalancing on BTC/ETH perpetual futures - 8-hour funding rate mean reversion - Target: 25% annual return, 30% volatility, Sharpe 0.8 These opposing characteristics (frequency, asset class, signal type) stress-test the workflow's adaptability.

Code Example

# Calculating 12-Month Momentum
def calculate_momentum(prices, lookback=252):
    """
    Calculate trailing 12-month momentum signal.

    Parameters:
    - prices: pd.DataFrame of daily adjusted close prices
    - lookback: int, number of trading days (252 ≈ 12 months)

    Returns:
    - pd.DataFrame of momentum scores (rank-normalized)
    """
    returns = prices.pct_change(lookback)
    momentum = returns.rank(axis=1, pct=True) - 0.5
    return momentum

Code Example

# Volatility-Scaled Position Sizing
def volatility_scale_positions(signals, returns, target_vol=0.10, lookback=20):
    """
    Scale positions by realized volatility.

    Parameters:
    - signals: pd.DataFrame of raw signals (-1 to 1)
    - returns: pd.DataFrame of daily returns
    - target_vol: float, target portfolio volatility
    - lookback: int, days for volatility estimation

    Returns:
    - pd.DataFrame of volatility-scaled positions
    """
    realized_vol = returns.rolling(lookback).std() * np.sqrt(252)
    scale = target_vol / realized_vol
    positions = signals * scale
    return positions.clip(-1, 1)  # Limit leverage

Practical Tips & Best Practices

  • Start with causality: understand why a signal should work before testing it
  • Use the workflow's feedback loops to iterate quickly—don't wait for perfection
  • Stage gates are your friend: kill bad ideas early to preserve research budget
  • Document regulatory constraints upfront to avoid wasted work
  • GenAI is powerful but introduces risks—validate everything it produces

Introduction

Welcome to ML3T! This introductory chapter covers the fundamentals of applying machine learning to trading.