ML4T Engineer
ML4T Engineer Documentation
Features, labels, alternative bars, and leakage-safe dataset preparation
Skip to content

ML4T Engineer

Turn OHLCV and tick data into model-ready features, labels, and sampling schemes without rewriting the same research code in every notebook and pipeline.

ml4t-engineer is the feature-engineering layer in the ML4T stack. It sits between ml4t-data, which prepares canonical datasets, and ml4t-diagnostic, which evaluates signals and models. Start here if you want a working workflow quickly, use the Book Guide to map notebooks to production APIs, and use the API Reference when you need exact interfaces.

Chapters 7-10 of Machine Learning for Trading, Third Edition develop many of these methods manually in notebooks. This library packages those computations as tested, reusable functions. See the Book Guide to map notebook code to library calls.

  • 120 Indicators, One Call --- Momentum, volatility, microstructure, trend, and 7 more categories through compute_features(df, indicators). Features

  • 60 TA-Lib Validated --- Indicators tested against TA-Lib to 1e-6 tolerance so notebook and pipeline outputs stay aligned. Quickstart

  • Labels, Bars, and Leakage Control --- Triple-barrier labels, alternative bars, preprocessing, and dataset splitting in the same workflow. Labeling

  • Book to Production --- The book teaches the methods step by step. This library turns them into reusable calls for research and scheduled pipelines. Book Guide

Quick Example

You have OHLCV data and need a feature matrix:

import polars as pl
from ml4t.engineer import compute_features

df = pl.read_parquet("spy_daily.parquet")
features = compute_features(df, ["rsi", "macd", "atr"])

print(features.select("rsi_14", "macd", "atr_14").tail(3))

That single call appends validated indicator columns to the same DataFrame you will pass downstream into labeling, preprocessing, and model training.

Core Workflows

1. Add supervised labels

You have features. You need targets for a classification model:

from ml4t.engineer.config import LabelingConfig
from ml4t.engineer.labeling import triple_barrier_labels

config = LabelingConfig.triple_barrier(
    upper_barrier=0.02, lower_barrier=0.01, max_holding_period=20,
)
labels = triple_barrier_labels(features, config=config)

This produces standardized label columns such as label, label_return, and barrier_hit for supervised learning workflows.

2. Build train/test data without leakage

You have features and labels. You need train/test splits with train-only scaling:

from ml4t.engineer import create_dataset_builder

builder = create_dataset_builder(
    features=labels.select(["rsi_14", "macd", "atr_14"]),
    labels=labels["label"],
    dates=labels["timestamp"],
    scaler="robust",
)
X_train, X_test, y_train, y_test = builder.train_test_split(train_size=0.8)

This keeps preprocessing statistics on the training window only, which is the default you want for time-series ML.

3. Use non-time bars when time bars are the wrong abstraction

You have trade data and need bars tied to market activity instead of clock time:

from ml4t.engineer.bars import VolumeBarSampler

sampler = VolumeBarSampler(volume_per_bar=50_000)
volume_bars = sampler.sample(trades_df)

This turns raw trade prints into OHLCV bars that are easier to use in downstream feature and labeling pipelines.

4. Preserve memory while making series stationary

You need a stationary series but do not want to erase signal with first differences:

from ml4t.engineer.features.fdiff import find_optimal_d, ffdiff

result = find_optimal_d(df["close"])
ffd_close = ffdiff(df["close"], d=result["optimal_d"])

This is the standard bridge from Chapter 9’s fractional differencing workflow to a reusable production transform.

Documentation Entry Points

Feature Catalog

Use this as reference once you know what kind of signal you want to build.

Category Count Examples
Momentum 31 RSI, MACD, Stochastic, CCI, ADX, MFI
Microstructure 15 Kyle Lambda, VPIN, Amihud, Roll spread
Volatility 15 ATR, Bollinger, Yang-Zhang, Parkinson
Statistics 14 Variance, Linear Regression, Correlation
ML 14 Fractional Diff, Entropy, Lag features
Trend 10 SMA, EMA, WMA, DEMA, TEMA, KAMA
Cross-Asset 10 Beta, Correlation, Cointegration
Risk 6 Max Drawdown, Sortino, CVaR
Price Transform 5 Typical Price, Weighted Close
Regime 4 Hurst Exponent, Choppiness Index
Volume 3 OBV, AD, ADOSC
Math 3 MAX, MIN, SUM

Installation

pip install ml4t-engineer

If you use uv, uv pip install ml4t-engineer is equivalent. See Installation for environment details and optional TA-Lib setup.

Part of the ML4T Library Suite

ml4t-data → ml4t-engineer → ml4t-diagnostic → ml4t-backtest → ml4t-live

ml4t-engineer is where raw market data becomes reusable research inputs: features, labels, alternative bars, and leakage-safe training datasets.