ML4T Engineer¶
Turn OHLCV and tick data into model-ready features, labels, and sampling schemes without rewriting the same research code in every notebook and pipeline.
ml4t-engineer is the feature-engineering layer in the ML4T stack. It sits between
ml4t-data, which prepares canonical datasets, and ml4t-diagnostic, which
evaluates signals and models. Start here if you want a working workflow quickly, use
the Book Guide to map notebooks to production APIs, and use the
API Reference when you need exact interfaces.
Chapters 7-10 of Machine Learning for Trading, Third Edition develop many of these methods manually in notebooks. This library packages those computations as tested, reusable functions. See the Book Guide to map notebook code to library calls.
-
120 Indicators, One Call --- Momentum, volatility, microstructure, trend, and 7 more categories through
compute_features(df, indicators). Features -
60 TA-Lib Validated --- Indicators tested against TA-Lib to
1e-6tolerance so notebook and pipeline outputs stay aligned. Quickstart -
Labels, Bars, and Leakage Control --- Triple-barrier labels, alternative bars, preprocessing, and dataset splitting in the same workflow. Labeling
-
Book to Production --- The book teaches the methods step by step. This library turns them into reusable calls for research and scheduled pipelines. Book Guide
Quick Example¶
You have OHLCV data and need a feature matrix:
import polars as pl
from ml4t.engineer import compute_features
df = pl.read_parquet("spy_daily.parquet")
features = compute_features(df, ["rsi", "macd", "atr"])
print(features.select("rsi_14", "macd", "atr_14").tail(3))
That single call appends validated indicator columns to the same DataFrame you will pass downstream into labeling, preprocessing, and model training.
Core Workflows¶
1. Add supervised labels¶
You have features. You need targets for a classification model:
from ml4t.engineer.config import LabelingConfig
from ml4t.engineer.labeling import triple_barrier_labels
config = LabelingConfig.triple_barrier(
upper_barrier=0.02, lower_barrier=0.01, max_holding_period=20,
)
labels = triple_barrier_labels(features, config=config)
This produces standardized label columns such as label, label_return, and
barrier_hit for supervised learning workflows.
2. Build train/test data without leakage¶
You have features and labels. You need train/test splits with train-only scaling:
from ml4t.engineer import create_dataset_builder
builder = create_dataset_builder(
features=labels.select(["rsi_14", "macd", "atr_14"]),
labels=labels["label"],
dates=labels["timestamp"],
scaler="robust",
)
X_train, X_test, y_train, y_test = builder.train_test_split(train_size=0.8)
This keeps preprocessing statistics on the training window only, which is the default you want for time-series ML.
3. Use non-time bars when time bars are the wrong abstraction¶
You have trade data and need bars tied to market activity instead of clock time:
from ml4t.engineer.bars import VolumeBarSampler
sampler = VolumeBarSampler(volume_per_bar=50_000)
volume_bars = sampler.sample(trades_df)
This turns raw trade prints into OHLCV bars that are easier to use in downstream feature and labeling pipelines.
4. Preserve memory while making series stationary¶
You need a stationary series but do not want to erase signal with first differences:
from ml4t.engineer.features.fdiff import find_optimal_d, ffdiff
result = find_optimal_d(df["close"])
ffd_close = ffdiff(df["close"], d=result["optimal_d"])
This is the standard bridge from Chapter 9’s fractional differencing workflow to a reusable production transform.
Documentation Entry Points¶
- Quickstart for a working feature and labeling run
- Features for the core computation API
- Labeling for supervised targets and sample weighting
- Book Guide for chapter, notebook, and case-study mapping
- API Reference for exact interfaces
Feature Catalog¶
Use this as reference once you know what kind of signal you want to build.
| Category | Count | Examples |
|---|---|---|
| Momentum | 31 | RSI, MACD, Stochastic, CCI, ADX, MFI |
| Microstructure | 15 | Kyle Lambda, VPIN, Amihud, Roll spread |
| Volatility | 15 | ATR, Bollinger, Yang-Zhang, Parkinson |
| Statistics | 14 | Variance, Linear Regression, Correlation |
| ML | 14 | Fractional Diff, Entropy, Lag features |
| Trend | 10 | SMA, EMA, WMA, DEMA, TEMA, KAMA |
| Cross-Asset | 10 | Beta, Correlation, Cointegration |
| Risk | 6 | Max Drawdown, Sortino, CVaR |
| Price Transform | 5 | Typical Price, Weighted Close |
| Regime | 4 | Hurst Exponent, Choppiness Index |
| Volume | 3 | OBV, AD, ADOSC |
| Math | 3 | MAX, MIN, SUM |
Installation¶
If you use uv, uv pip install ml4t-engineer is equivalent. See
Installation for environment details and optional
TA-Lib setup.
Part of the ML4T Library Suite¶
ml4t-engineer is where raw market data becomes reusable research inputs: features,
labels, alternative bars, and leakage-safe training datasets.