Home / Libraries / Docs
ML4T Libraries
Documentation
Skip to content

Latent-Factor Pipelines

The core latent-factor abstraction in ml4t-models is:

structural model -> latent factor state -> factor forecaster -> asset mapper

This keeps three distinct economic objects separate:

  1. exposures and factor realizations
  2. ex ante factor-premium forecasts
  3. asset-level expected returns

Two-Step Latent-Factor Forecasting

Why The Split Matters

For models like IPCA and CAE, the in-sample fitted return uses realized factor returns. That is useful for reconstruction and attribution, but not directly for trading.

The implementable forecast replaces realized factor returns with a forecast or estimate of factor premia.

This distinction is the whole point of the two-step design:

  • the structural model explains realized returns
  • the forecaster produces an ex ante premium estimate
  • the mapper turns today's exposures and that premium estimate into asset-level forecasts

That is the paper-faithful interpretation of IPCA and CAE.

Pipeline Class

LatentFactorForecastPipeline composes the three stages:

from ml4t.models import (
    BetaLambdaMapper,
    ExpandingMeanFactorForecaster,
    IPCAConfig,
    IPCAModel,
    LatentFactorForecastPipeline,
)

pipeline = LatentFactorForecastPipeline(
    model=IPCAModel(IPCAConfig(n_factors=3)),
    forecaster=ExpandingMeanFactorForecaster(),
    mapper=BetaLambdaMapper(),
)

Objects At Each Stage

Structural Model

Implements:

  • fit(batch) -> FitSummary
  • extract(batch, checkpoint=None) -> LatentFactorState

The extracted state contains:

  • asset_betas
  • optional factor_returns
  • timestamps and asset IDs
  • metadata such as the selected checkpoint

Factor Forecaster

Implements:

  • fit(state) -> FitSummary
  • predict(state) -> FactorForecastResult

Current forecasters:

  • ExpandingMeanFactorForecaster
  • AR1FactorForecaster
  • EWMABaseFactorForecaster

Asset Mapper

Implements:

  • predict(state, factor_forecast) -> AssetForecastResult

Current mapper:

  • BetaLambdaMapper

The Default Predictive Baseline

The simplest predictive latent-factor workflow is:

fit structural model on training data
estimate historical factor returns
forecast factor premia by the training-sample mean
map betas × premia back to assets

This is the baseline represented by:

  • ExpandingMeanFactorForecaster
  • BetaLambdaMapper

The point of this baseline is clarity. It makes explicit what is structural estimation and what is factor-premium forecasting.

Beyond The Mean Baseline

The mean-premium forecast is a baseline, not a ceiling. The scalable-CAE literature argues that better forecasts of the latent factor series can improve the final asset-level signal.

The library therefore keeps the forecaster modular. You can swap in:

  • AR1FactorForecaster
  • EWMABaseFactorForecaster

without changing the structural estimator.

Factor Forecaster Menu

The architectural point is more important than the specific forecaster list: structural estimation and factor-premium forecasting are separate layers.

Checkpoints

Neural structural models such as CAEModel expose configurable checkpoints:

  • checkpoint_interval
  • checkpoint_epochs
  • default_checkpoint

This lets you:

  • extract structural states at multiple training horizons
  • fit and evaluate downstream factor forecasters at those checkpoints
  • choose reporting checkpoints explicitly rather than hard-coding hidden "best epoch" behavior

Diagram

flowchart LR
    A[Batch] --> B[Structural Model]
    B --> C[LatentFactorState]
    C --> D[Factor Forecaster]
    D --> E[FactorForecastResult]
    C --> F[Asset Mapper]
    E --> F
    F --> G[AssetForecastResult]
    G --> H[PredictionsFrame]
    H --> I[ml4t-backtest / ml4t-diagnostic]

When Not To Use This Pipeline

Do not force:

  • StochasticDiscountFactorModel
  • portfolio learners
  • direct signal predictors

through this latent-factor composition. Those families solve different problems and have different native outputs.