ML4T Data¶
High-performance market data acquisition for quantitative finance.
-
5-Minute Setup
Get market data in 3 lines of code. No API keys required for basic usage.
-
20+ Provider Adapters
Equities, crypto, forex, futures, macro, prediction markets, and factors.
-
10-100x Faster
Polars-based processing with async batch loading for maximum throughput.
-
Pipeline Ready
Circuit breakers, rate limiting, OHLC validation, and incremental updates.
Quick Example¶
from ml4t.data.providers import YahooFinanceProvider
# Fetch OHLCV data (no API key needed)
provider = YahooFinanceProvider()
df = provider.fetch_ohlcv("AAPL", "2024-01-01", "2024-12-31")
print(df.head())
# shape: (252, 7)
# ┌─────────────────────┬────────┬────────┬────────┬────────┬────────┬──────────┐
# │ timestamp ┆ symbol ┆ open ┆ high ┆ low ┆ close ┆ volume │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ datetime[μs, UTC] ┆ str ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
# ╞═════════════════════╪════════╪════════╪════════╪════════╪════════╪══════════╡
# │ 2024-01-02 00:00:00 ┆ AAPL ┆ 187.15 ┆ 188.44 ┆ 183.89 ┆ 185.64 ┆ 82488700 │
# └─────────────────────┴────────┴────────┴────────┴────────┴────────┴──────────┘
Async Batch Loading (3-10x Faster)¶
import asyncio
from ml4t.data.managers.async_batch import async_batch_load
from ml4t.data.providers import YahooFinanceProvider
async def fetch_portfolio():
async with YahooFinanceProvider() as provider:
return await async_batch_load(
provider,
symbols=["AAPL", "MSFT", "GOOGL", "AMZN", "META"],
start="2024-01-01",
end="2024-12-31",
max_concurrent=10,
)
df = asyncio.run(fetch_portfolio())
print(f"Fetched {len(df)} rows for {df['symbol'].n_unique()} symbols")
Installation¶
Provider Comparison¶
| Provider | Asset Class | Free Tier | Async | Best For |
|---|---|---|---|---|
| Yahoo | Stocks, ETFs, Crypto | Unlimited | Thread | Learning, backtesting |
| CoinGecko | Crypto | 10K+ coins | Native | Crypto historical |
| EODHD | Global Stocks | 500/day | Native | Global coverage |
| DataBento | Futures, Options | $10 credits | Thread | Institutional data |
| Fama-French | Factors | Unlimited | Thread | Academic research |
For ML4T Book Readers¶
This library is the reference implementation for Machine Learning for Trading (Third Edition). The book uses ml4t-data across 6 chapters and 25 notebooks, covering 14 of 20 providers.
-
Chapter-Feature Mapping
- Ch 2: DataManager, Universe, HiveStorage, gap detection, data quality
- Ch 4: FRED, CoinGecko, Kalshi, Polymarket, COT data
- Ch 16-19: Binance, Fama-French, AQR for backtesting and risk
-
Recurring Workflows
Graduate from notebooks to automated pipelines with
download_all.py --update, incremental updates, and CLI automation.
Next Steps¶
-
Detailed installation guide with all optional dependencies.
-
Complete documentation for all features.
-
Auto-generated documentation from source code.
-
Create your own provider or contribute to the project.