Home / Libraries / Docs
ML4T Libraries
Documentation
Skip to content

User Guide

ml4t-models is organized around model semantics, not around one generic trainer abstraction.

The fastest way to understand the library is to ask four questions:

  1. what data contract does the model require?
  2. what object does it estimate?
  3. what is the native output?
  4. what still has to happen before you can trade or backtest it?

Model Families

The core design choice is simple: do not force structurally different finance models through one artificial fit/predict story. A latent-factor model, a stochastic discount factor model, a direct supervised predictor, and an end-to-end portfolio learner may all use neural networks, but they are not estimating the same object.

The Four Main Workflows

1. Latent-Factor Forecasting

Used by:

  • PCAModel
  • RPPCAModel
  • IPCAModel
  • CAEModel

Workflow:

batch -> structural model -> latent factor state -> factor forecaster -> asset mapper

This is the right abstraction for models where:

  • exposures and factor realizations are the structural objects
  • expected returns come from a separate premium forecast

2. Stochastic Discount Factor Estimation

Used by:

  • StochasticDiscountFactorModel

Workflow:

cross-section batch -> phase-aware no-arbitrage training -> asset weights + SDF series

This family is intentionally separate because the native object is a traded pricing-kernel proxy, not a latent factor plus a premium forecast.

3. Direct Asset Prediction

Used by:

  • SAEModel

Workflow:

cross-section batch -> supervised autoencoder -> asset signals

This is where the library puts supervised models that predict asset-level signals directly.

4. End-To-End Portfolio Learning

Used by:

  • LinearFeaturePortfolioModel
  • LSTMPortfolioModel
  • DeepPortfolioModel

Workflow:

portfolio sequence batch -> allocation model -> target weights -> optional postprocessing

These models optimize allocation decisions directly rather than first estimating returns.

Design Rules

  • Stable-ID panel models and ragged cross-sectional models use different contracts.
  • Neural checkpoints are configurable rather than hard-coded.
  • Forecasting is kept outside structural latent-factor estimation.
  • Evaluation belongs in ml4t-diagnostic, not in this library.
  • Execution belongs in ml4t-backtest, not in this library.

A Good Reading Strategy

If you want the economic logic first:

  1. Latent-Factor Pipelines
  2. Stochastic Discount Factor
  3. Portfolio Learning

If you want to wire the library into a workflow quickly:

  1. Data Contracts
  2. Training Procedures
  3. Integration

Reading Order