Chapter 3

Market Microstructure

5 sections 17 notebooks 22 references Code

Learning Objectives

  • Explain how liquidity, order types, market design, and intraday trading regimes shape observed market data and execution quality.
  • Distinguish among major market data products, including L1, L2, L3, TAQ, and enriched bar datasets, and choose data that matches a research or trading objective.
  • Parse message-based exchange data and reconstruct a venue-local limit order book while enforcing core lifecycle and accounting invariants.
  • Interpret key order-book measures and empirical microstructure patterns, while recognizing the limits of visible single-venue data.
  • Build and compare time-, activity-, and information-driven bars, including when trade-direction classification and Lee-Ready alignment are required.
  • Apply intraday data-quality and sessionization checks that prevent sequencing, timestamp, and calendar errors from contaminating downstream analysis.
Figure 3.1
3.1

Microstructure: The DNA of Price Formation

Market data is the observable output of matching engine rules plus order flow, and understanding these mechanics is essential for realistic strategy design. The section decomposes liquidity into spread, depth, and resiliency, explains how adverse selection and inventory risk shape quote formation (grounded in Kyle 1985 and Glosten-Milgrom 1985), and catalogs order types from basic limit/market through hidden, pegged, and post-only variants. It also covers intraday seasonality regimes (opening, midday, closing) and why the same signal can mean different things at different times of day.

3.2

The Anatomy of Modern Market Data Feeds

The data hierarchy runs from Level 1 top-of-book quotes through Level 2 price-level depth to Level 3 order-level messages — each feed reveals and conceals different aspects of market state. The section covers trades-and-quotes packages, price conventions (midprice vs last trade vs close), message protocols (FIX, binary multicast, crypto REST/WebSocket), and five sample datasets spanning the full hierarchy. Feed selection is a strategy design decision with direct implications for what can be observed, researched, and backtested.

3.3

From Raw Messages to the Limit Order Book

LOB reconstruction is presented as a state-machine engineering problem, using NASDAQ TotalView-ITCH as a case study with 423 million messages per day. The section covers binary parsing at scale (26 minutes Python vs under 3 minutes Rust), the order registry and price-level view that must be maintained, and integrity invariants that must be enforced. Empirical findings reveal a 97.6% cancellation rate with 41% of cancellations within 500ms, and predictive patterns including top-of-book depth imbalance (mean correlation ~0.1 with returns) and intraday U-shaped volume.

16 notebooks

3.4

The Art of Sampling: From Ticks to Bars

Bar sampling methods are compared for converting raw tick data into analytical time series: time bars, tick bars, volume bars, dollar bars, and information-driven bars (tick imbalance, volume imbalance, run bars). Lee-Ready trade classification achieves 96% accuracy versus 84% for the tick test alone. Dollar bars consistently achieve the best return normality (JB=84.7 vs 3,838 for time bars on NVDA) and are the recommended default for most ML workflows due to good statistical properties and natural price-level scaling.

4 notebooks

3.5

Microstructure Data Quality and Sessionization

Microstructure errors are typically inconsistencies (out-of-sequence events, invalid book transitions, stale quotes) rather than extreme values, making invariant checks more appropriate than statistical outlier filters. The section prescribes layered validation: event sequencing first, then trade/quote sanity, then LOB reconstruction invariants, then cross-source reconciliation. Session handling is a correctness requirement where misclassifying weekends, holidays, or early closes creates artificial returns. A microstructure-aware backtest checklist covers executable prices, maker/taker fees, partial fills, auction handling, and venue scope.