ML4T Models¶

Build finance-native latent-factor, stochastic discount factor, direct signal, and portfolio-learning models without collapsing everything into one generic trainer.

ml4t.models is the modeling layer in the ML4T stack. It packages model families that matter in empirical asset pricing and portfolio construction while keeping the contracts explicit:

what kind of data each model expects
what object it estimates
what must still happen before you have an implementable forecast or tradable weight vector

If you are new to the library, start with the Quickstart. If you are coming from Machine Learning for Trading, the Book Guide maps the chapter implementations to the production API.

Latent Factors Done Explicitly

Structural extraction, factor-premium forecasting, and asset mapping are separate stages. This keeps PCA, RP-PCA, IPCA, and CAE conceptually clean. Latent-Factor Pipelines
No-Arbitrage SDF Modeling

The stochastic discount factor family is weight-native and phase-aware. It is not forced into the same beta × lambda contract as latent-factor models. Stochastic Discount Factor
End-To-End Portfolio Learning

Learn allocations directly with deterministic, LSTM, and DeePM-style portfolio models. Keep allocation objectives separate from return forecasting logic. Portfolio Learning
Built For The ML4T Stack

Emit prediction and weight frames for ml4t-backtest and ml4t-diagnostic without duplicating evaluation logic inside the model library. Integration

Architecture At A Glance¶

Model Family Map

Why This Library Exists¶

Many finance models look similar at the tensor level but behave very differently conceptually:

PCA and RP-PCA estimate persistent-panel latent factors
IPCA and CAE estimate conditional exposures from dated cross-sections
StochasticDiscountFactorModel learns a no-arbitrage pricing object through weight-native training
SAEModel is a direct supervised predictor
portfolio models learn allocations directly rather than first forecasting returns

The library reflects those differences instead of hiding them behind one catch-all fit/predict story.

Quick Example¶

import numpy as np

from ml4t.models import (
    BetaLambdaMapper,
    CrossSectionBatch,
    ExpandingMeanFactorForecaster,
    IPCAConfig,
    IPCAModel,
    LatentFactorForecastPipeline,
)

batch = CrossSectionBatch(
    characteristics=np.random.randn(24, 150, 10),
    returns=np.random.randn(24, 150),
    timestamps=tuple(range(24)),
)

pipeline = LatentFactorForecastPipeline(
    model=IPCAModel(IPCAConfig(n_factors=3)),
    forecaster=ExpandingMeanFactorForecaster(),
    mapper=BetaLambdaMapper(),
)
pipeline.fit(batch)
prediction = pipeline.predict(batch)

print(prediction.state.asset_betas.shape)
print(prediction.asset_forecast.expected_returns.shape)

Three Core Contracts¶

Contract	Used by	What it represents
`PersistentPanelBatch`	`PCAModel`, `RPPCAModel`	stable-entity return panel
`CrossSectionBatch`	`IPCAModel`, `CAEModel`, `SAEModel`, `StochasticDiscountFactorModel`	dated observed cross-sections, ragged by construction
`PortfolioSequenceBatch`	`LinearFeaturePortfolioModel`, `LSTMPortfolioModel`, `DeepPortfolioModel`	sequence-to-allocation learning

Model Families¶

latent_factors
├── PCAModel
├── RPPCAModel
├── IPCAModel
└── CAEModel

stochastic_discount_factor
└── StochasticDiscountFactorModel

asset_prediction
└── SAEModel

portfolio
├── LinearFeaturePortfolioModel
├── LSTMPortfolioModel
└── DeepPortfolioModel