Latent-Factor Pipelines¶
The core latent-factor abstraction in ml4t-models is:
This keeps three distinct economic objects separate:
- exposures and factor realizations
- ex ante factor-premium forecasts
- asset-level expected returns

Why The Split Matters¶
For models like IPCA and CAE, the in-sample fitted return uses realized factor returns.
That is useful for reconstruction and attribution, but not directly for trading.
The implementable forecast replaces realized factor returns with a forecast or estimate of factor premia.
This distinction is the whole point of the two-step design:
- the structural model explains realized returns
- the forecaster produces an ex ante premium estimate
- the mapper turns today's exposures and that premium estimate into asset-level forecasts
That is the paper-faithful interpretation of IPCA and CAE.
Pipeline Class¶
LatentFactorForecastPipeline composes the three stages:
from ml4t.models import (
BetaLambdaMapper,
ExpandingMeanFactorForecaster,
IPCAConfig,
IPCAModel,
LatentFactorForecastPipeline,
)
pipeline = LatentFactorForecastPipeline(
model=IPCAModel(IPCAConfig(n_factors=3)),
forecaster=ExpandingMeanFactorForecaster(),
mapper=BetaLambdaMapper(),
)
Objects At Each Stage¶
Structural Model¶
Implements:
fit(batch) -> FitSummaryextract(batch, checkpoint=None) -> LatentFactorState
The extracted state contains:
asset_betas- optional
factor_returns - timestamps and asset IDs
- metadata such as the selected checkpoint
Factor Forecaster¶
Implements:
fit(state) -> FitSummarypredict(state) -> FactorForecastResult
Current forecasters:
ExpandingMeanFactorForecasterAR1FactorForecasterEWMABaseFactorForecaster
Asset Mapper¶
Implements:
predict(state, factor_forecast) -> AssetForecastResult
Current mapper:
BetaLambdaMapper
The Default Predictive Baseline¶
The simplest predictive latent-factor workflow is:
fit structural model on training data
estimate historical factor returns
forecast factor premia by the training-sample mean
map betas × premia back to assets
This is the baseline represented by:
ExpandingMeanFactorForecasterBetaLambdaMapper
The point of this baseline is clarity. It makes explicit what is structural estimation and what is factor-premium forecasting.
Beyond The Mean Baseline¶
The mean-premium forecast is a baseline, not a ceiling. The scalable-CAE literature argues that better forecasts of the latent factor series can improve the final asset-level signal.
The library therefore keeps the forecaster modular. You can swap in:
AR1FactorForecasterEWMABaseFactorForecaster
without changing the structural estimator.

The architectural point is more important than the specific forecaster list: structural estimation and factor-premium forecasting are separate layers.
Checkpoints¶
Neural structural models such as CAEModel expose configurable checkpoints:
checkpoint_intervalcheckpoint_epochsdefault_checkpoint
This lets you:
- extract structural states at multiple training horizons
- fit and evaluate downstream factor forecasters at those checkpoints
- choose reporting checkpoints explicitly rather than hard-coding hidden "best epoch" behavior
Diagram¶
flowchart LR
A[Batch] --> B[Structural Model]
B --> C[LatentFactorState]
C --> D[Factor Forecaster]
D --> E[FactorForecastResult]
C --> F[Asset Mapper]
E --> F
F --> G[AssetForecastResult]
G --> H[PredictionsFrame]
H --> I[ml4t-backtest / ml4t-diagnostic]
When Not To Use This Pipeline¶
Do not force:
StochasticDiscountFactorModel- portfolio learners
- direct signal predictors
through this latent-factor composition. Those families solve different problems and have different native outputs.