Home / Libraries / ML4T Data / Docs
ML4T Data
ML4T Data Documentation
Unified market data acquisition from 19+ providers
Skip to content

ML4T Data

High-performance market data acquisition for quantitative finance.

  • 5-Minute Setup


    Get market data in 3 lines of code. No API keys required for basic usage.

    Quickstart

  • 20+ Provider Adapters


    Equities, crypto, forex, futures, macro, prediction markets, and factors.

    Provider Guide

  • 10-100x Faster


    Polars-based processing with async batch loading for maximum throughput.

    Performance

  • Pipeline Ready


    Circuit breakers, rate limiting, OHLC validation, and incremental updates.

    Features

Quick Example

from ml4t.data.providers import YahooFinanceProvider

# Fetch OHLCV data (no API key needed)
provider = YahooFinanceProvider()
df = provider.fetch_ohlcv("AAPL", "2024-01-01", "2024-12-31")

print(df.head())
# shape: (252, 7)
# ┌─────────────────────┬────────┬────────┬────────┬────────┬────────┬──────────┐
# │ timestamp           ┆ symbol ┆ open   ┆ high   ┆ low    ┆ close  ┆ volume   │
# │ ---                 ┆ ---    ┆ ---    ┆ ---    ┆ ---    ┆ ---    ┆ ---      │
# │ datetime[μs, UTC]   ┆ str    ┆ f64    ┆ f64    ┆ f64    ┆ f64    ┆ f64      │
# ╞═════════════════════╪════════╪════════╪════════╪════════╪════════╪══════════╡
# │ 2024-01-02 00:00:00 ┆ AAPL   ┆ 187.15 ┆ 188.44 ┆ 183.89 ┆ 185.64 ┆ 82488700 │
# └─────────────────────┴────────┴────────┴────────┴────────┴────────┴──────────┘

Async Batch Loading (3-10x Faster)

import asyncio
from ml4t.data.managers.async_batch import async_batch_load
from ml4t.data.providers import YahooFinanceProvider

async def fetch_portfolio():
    async with YahooFinanceProvider() as provider:
        return await async_batch_load(
            provider,
            symbols=["AAPL", "MSFT", "GOOGL", "AMZN", "META"],
            start="2024-01-01",
            end="2024-12-31",
            max_concurrent=10,
        )

df = asyncio.run(fetch_portfolio())
print(f"Fetched {len(df)} rows for {df['symbol'].n_unique()} symbols")

Installation

pip install ml4t-data
uv add ml4t-data
pip install "ml4t-data[yahoo,databento]"

Provider Comparison

Provider Asset Class Free Tier Async Best For
Yahoo Stocks, ETFs, Crypto Unlimited Thread Learning, backtesting
CoinGecko Crypto 10K+ coins Native Crypto historical
EODHD Global Stocks 500/day Native Global coverage
DataBento Futures, Options $10 credits Thread Institutional data
Fama-French Factors Unlimited Thread Academic research

Full provider guide

For ML4T Book Readers

This library is the reference implementation for Machine Learning for Trading (Third Edition). The book uses ml4t-data across 6 chapters and 25 notebooks, covering 14 of 20 providers.

  • Chapter-Feature Mapping


    • Ch 2: DataManager, Universe, HiveStorage, gap detection, data quality
    • Ch 4: FRED, CoinGecko, Kalshi, Polymarket, COT data
    • Ch 16-19: Binance, Fama-French, AQR for backtesting and risk

    Full book guide

  • Recurring Workflows


    Graduate from notebooks to automated pipelines with download_all.py --update, incremental updates, and CLI automation.

    Incremental updates

Next Steps

  • Installation

    Detailed installation guide with all optional dependencies.

  • User Guide

    Complete documentation for all features.

  • API Reference

    Auto-generated documentation from source code.

  • Contributing

    Create your own provider or contribute to the project.