Chapter 26: MLOps and Governance

Training-Serving Skew, Point-in-Time Joins, and Feature Stores

Many live model failures look like alpha decay until you discover that training and inference never computed the same feature in the first place.

Training-Serving Skew, Point-in-Time Joins, and Feature Stores

Many live model failures look like alpha decay until you discover that training and inference never computed the same feature in the first place.

The Intuition

A model can be perfectly fine and still fail in production because the live system feeds it a different feature than the one used in training. This is training-serving skew.

The dangerous part is that skew often looks statistical. Performance drops, feature importance shifts, and the team starts blaming regime change. But the root cause is technical: the offline and online pipelines implemented different definitions.

In trading systems, this usually happens through:

point-in-time joins that are correct offline but impossible or mis-specified online
different missing-value rules
different normalization windows
late-arriving data that is silently included in training but unavailable live

That is why Chapter 26 frames feature stores as an infrastructure answer to a modeling problem. The problem is not just "where do we keep features?" It is "how do we guarantee that training and serving mean the same thing by a feature?"

A Precise Definition

Let a feature be a function

$$ x_t = \phi(\mathcal{D}_{\le t_{\text{decision}}}), $$

where the feature at decision time $t_decision$ is computed only from data available by that time.

Training-serving skew appears when the research stack and the live stack implement different functions:

$$ \phi_{\text{train}} \neq \phi_{\text{serve}}. $$

The model may be unchanged. The feature is not.

That is enough to create a fake model-decay story.

Where Skew Actually Comes From

Point-in-time joins

The classic bug is a join that looks innocent offline:

$sql select * from prices p join fundamentals f on p.symbol = f.symbol and f.report_date <= p.date $

but is not decision-time correct. The feature may need the latest available filing, not the latest filing by report date. In production, that distinction matters immediately.

Window definitions

A 20-day volatility feature in training may be computed on adjusted closes through yesterday's end of day, while the serving path accidentally includes today's partial bar or excludes the latest confirmed close. The names match. The semantics do not.

Missing-value defaults

Offline code may forward-fill a macro series. Online code may use zero, stale cache, or missing. Again, the model artifact matches while the feature meaning changes.

A Worked Scenario

Suppose the research pipeline trains a model on daily ETF features including:

20-day realized volatility
latest available macro surprise
sector-relative z-score

Offline, the macro feature uses a clean point-in-time table with release timestamps. Online, a quick implementation reads the latest vendor snapshot from cache. On revision days, the cache holds data that was not yet available at the decision timestamp used in training.

The result:

research feature says macro surprise = -0.4
live feature says macro surprise = +0.1
the model prediction shifts enough to change the trade

Nothing is wrong with the model weights. The live system is not serving the same feature.

This is why Chapter 26 ties skew directly to technical failure rather than to statistical decay.

Why Feature Stores Help

A feature store is useful when it enforces three things:

one canonical feature definition
point-in-time retrieval semantics
reproducible offline and online access paths

The offline store supports training and backfills. The online store supports low-latency serving. The value is not that these are separate databases. The value is that both are derived from the same declared feature logic and keyed by the same entity-plus-time semantics.

In practice, a feature store helps because it forces teams to answer questions that ad hoc pipelines avoid:

what is the event timestamp?
what is the availability timestamp?
what is the entity key?
what is the default for missing values?
what is the freshness requirement for serving?

Those are modeling questions wearing infrastructure clothing.

Feature Store Scope

Not every project needs a heavyweight platform. Chapter 26 is right to frame the stack as right-sized.

For a small team, the essential pieces are:

versioned feature definitions
reproducible offline materialization
point-in-time retrieval for training
an online serving path that uses the same semantics

That can be a disciplined internal pipeline before it becomes a full platform. The lesson is not "buy a feature store." The lesson is "do not let training and serving invent their own feature definitions."

Minimal Diagnostic Workflow

When live performance drops, ask these questions before retraining:

did the live feature values match the offline replay at the same decision timestamps?
were late-arriving data or revisions handled identically?
did missing-value and normalization rules match?
did the online system use the same entity and timestamp keys?

If the answer to any of these is no, the problem may be skew, not model decay.

In Practice

Three implementation rules matter most.

Track availability time, not just event time

A filing may describe quarter-end conditions but only become usable weeks later. The online path must respect the same availability semantics as training.

Materialize the hard features

Features that are expensive, fragile, or timestamp-sensitive should usually be materialized from the same canonical logic rather than reimplemented ad hoc in a separate serving path.

Make parity checks routine

Chapter 25's parity testing and Chapter 26's skew prevention are the same defense at different layers. Compare offline and online feature values regularly on known timestamps.

Common Mistakes

Treating feature names as if they guaranteed feature equivalence.
Joining on report dates instead of decision-time availability.
Letting online defaults differ from offline defaults for missing or late data.
Rewriting feature logic separately for training and serving "for convenience."
Interpreting skew-induced degradation as evidence that the model needs retraining.

Connections

This primer supports Section 26.6's infrastructure argument and connects directly to point-in-time data construction, parity testing, model lineage, and Chapter 25's deployment verification. It also explains why some apparent live decay is really a technical failure in feature semantics.

Training-Serving Skew, Point-in-Time Joins, and Feature Stores

Training-Serving Skew, Point-in-Time Joins, and Feature Stores

The Intuition

A Precise Definition

Where Skew Actually Comes From

Point-in-time joins

Window definitions

Missing-value defaults

A Worked Scenario

Why Feature Stores Help

Feature Store Scope

Minimal Diagnostic Workflow

In Practice

Track availability time, not just event time

Materialize the hard features

Make parity checks routine

Common Mistakes

Connections

Register to Read