Home / Libraries / ML4T Data / Docs
ML4T Data
ML4T Data Documentation
Unified market data acquisition from 19+ providers
Skip to content

API Reference

Complete API documentation for the ml4t-data library, auto-generated from source docstrings via mkdocstrings.


DataManager

The primary entry point for all data operations. DataManager is a facade that delegates to focused manager classes for configuration, fetching, storage, metadata, and batch operations.

from ml4t.data import DataManager

# Fetch-only (no storage)
manager = DataManager()
df = manager.fetch("AAPL", "2024-01-01", "2024-12-31", provider="yahoo")

# With storage for load/update workflows
from ml4t.data.storage import HiveStorage, StorageConfig

storage = HiveStorage(StorageConfig(base_path="./data"))
manager = DataManager(storage=storage, use_transactions=True)
key = manager.load("AAPL", "2024-01-01", "2024-12-31")
key = manager.update("AAPL")

DataManager

DataManager(
    config_path=None,
    output_format="polars",
    providers=None,
    storage=None,
    use_transactions=False,
    enable_validation=True,
    progress_callback=None,
    **kwargs,
)

Unified interface for financial data access and storage.

The DataManager provides a single, consistent API for fetching and managing data from multiple providers. It handles:

Data Fetching: - Provider selection based on symbol patterns - Configuration management (YAML, environment, parameters) - Connection pooling and session management - Output format conversion (Polars, pandas, lazy) - Batch fetching with error handling

Storage Operations (when storage configured): - Initial data loading with validation - Incremental updates with gap detection and filling - Transaction support for ACID guarantees - Progress callbacks for UI integration - Data validation (OHLCV, cross-validation)

Usage:

Fetch only (no storage): >>> manager = DataManager() >>> df = manager.fetch("AAPL", "2024-01-01", "2024-12-31", provider="yahoo")

With storage for load/update: >>> from ml4t.data.storage.hive import HiveStorage >>> from ml4t.data.storage.backend import StorageConfig >>> storage = HiveStorage(StorageConfig(base_path="./data")) >>> manager = DataManager(storage=storage, use_transactions=True) >>> key = manager.load("AAPL", "2024-01-01", "2024-12-31") >>> key = manager.update("AAPL") # Incremental update

Initialize DataManager.

Parameters:

Name Type Description Default
config_path str | None

Path to YAML configuration file

None
output_format str

Output format ('polars', 'pandas', 'lazy')

'polars'
providers dict[str, dict[str, Any]] | None

Provider-specific configuration overrides

None
storage Any | None

Optional storage backend for load/update operations

None
use_transactions bool

Enable transactional storage for ACID guarantees

False
enable_validation bool

Enable data validation during load/update

True
progress_callback Callable[[str, float], None] | None

Optional callback for progress updates (message, progress)

None
**kwargs

Additional configuration parameters

{}

config property

config

Get configuration dictionary.

output_format property

output_format

Get output format.

storage property

storage

Get storage backend.

fetch

fetch(
    symbol,
    start,
    end,
    frequency="daily",
    provider=None,
    **kwargs,
)

Fetch data for a symbol.

Parameters:

Name Type Description Default
symbol str

Symbol to fetch

required
start str

Start date (YYYY-MM-DD)

required
end str

End date (YYYY-MM-DD)

required
frequency str

Data frequency (daily, hourly, etc.)

'daily'
provider str | None

Optional provider override

None
**kwargs

Additional provider-specific parameters

{}

Returns:

Type Description
DataFrame | LazyFrame | Any

Data in configured output format

Raises:

Type Description
ValueError

If no provider found or data fetch fails

fetch_batch

fetch_batch(
    symbols, start, end, frequency="daily", **kwargs
)

Fetch data for multiple symbols.

Parameters:

Name Type Description Default
symbols list[str]

List of symbols to fetch

required
start str

Start date (YYYY-MM-DD)

required
end str

End date (YYYY-MM-DD)

required
frequency str

Data frequency

'daily'
**kwargs

Additional parameters

{}

Returns:

Type Description
dict[str, DataFrame | LazyFrame | Any | None]

Dictionary mapping symbols to data (or None if fetch failed)

batch_load

batch_load(
    symbols,
    start,
    end,
    frequency="daily",
    provider=None,
    max_workers=4,
    fail_on_partial=False,
    **kwargs,
)

Fetch data for multiple symbols and return in multi-asset stacked format.

batch_load_universe

batch_load_universe(
    universe,
    start,
    end,
    frequency="daily",
    provider=None,
    max_workers=4,
    fail_on_partial=False,
    **kwargs,
)

Fetch data for all symbols in a pre-defined universe.

batch_load_from_storage

batch_load_from_storage(
    symbols,
    start,
    end,
    frequency="daily",
    asset_class="equities",
    provider=None,
    fetch_missing=True,
    max_workers=4,
    **kwargs,
)

Load multiple symbols from storage with optional fetch fallback.

load

load(
    symbol,
    start,
    end,
    frequency="daily",
    asset_class="equities",
    provider=None,
    bar_type="time",
    bar_threshold=None,
    exchange="UNKNOWN",
    calendar=None,
)

Load data from provider and store it.

import_data

import_data(
    data,
    symbol,
    provider,
    frequency="daily",
    asset_class="equities",
    bar_type="time",
    bar_threshold=None,
    exchange="UNKNOWN",
    calendar=None,
)

Import external data into storage with metadata.

update

update(
    symbol,
    frequency="daily",
    asset_class="equities",
    lookback_days=7,
    fill_gaps=True,
    provider=None,
)

Update existing data with incremental fetch.

list_symbols

list_symbols(
    provider=None,
    asset_class=None,
    exchange=None,
    bar_type=None,
)

List all symbols in storage, optionally filtered by metadata.

get_metadata

get_metadata(
    symbol, asset_class="equities", frequency="daily"
)

Get metadata for a specific symbol.

assign_sessions

assign_sessions(df, exchange=None, calendar=None)

Assign session_date column to DataFrame based on exchange calendar.

complete_sessions

complete_sessions(
    df,
    exchange=None,
    calendar=None,
    fill_gaps=True,
    fill_method="forward",
    zero_volume=True,
)

Complete sessions by filling gaps.

update_all

update_all(provider=None, asset_class=None, exchange=None)

Update all stored data matching the filters.

list_providers

list_providers()

List available providers.

get_provider_info

get_provider_info(provider_name)

Get information about a provider.

clear_cache

clear_cache()

Clear routing cache and close provider connections.


Storage

StorageConfig

Dataclass configuring the storage backend. Controls partitioning strategy, compression, locking, and metadata tracking.

from ml4t.data.storage import StorageConfig

# Hive-partitioned storage for minute data
config = StorageConfig(
    base_path="./market_data",
    strategy="hive",
    partition_granularity="day",
    compression="zstd",
)

# Flat storage for small datasets
config = StorageConfig(
    base_path="./data",
    strategy="flat",
    compression="snappy",
)

StorageConfig dataclass

StorageConfig(
    base_path,
    strategy="hive",
    compression="zstd",
    partition_granularity="month",
    partition_cols=None,
    atomic_writes=True,
    enable_locking=True,
    metadata_tracking=True,
    generate_profile=True,
)

Configuration for storage backends.

Attributes:

Name Type Description
base_path Path

Base directory for storage.

strategy str

Storage strategy ("hive" or "flat").

compression str | None

Compression type for Parquet files.

partition_granularity PartitionGranularityType

Time-based partition granularity for Hive storage. - "year": Best for daily data (~252 rows/partition for stocks) - "month": Best for hourly data (~720 rows/partition) - "day": Best for minute data (~1,440 rows/partition) - "hour": Best for second/tick data (~3,600 rows/partition)

partition_cols list[str] | None

Deprecated. Use partition_granularity instead.

atomic_writes bool

Use atomic writes with temp file rename.

enable_locking bool

Enable file locking for concurrent access.

metadata_tracking bool

Track metadata in manifest files.

__post_init__
__post_init__()

Validate and set defaults.

StorageBackend

Abstract base class defining the storage interface. All backends (Hive, Flat) implement this contract.

StorageBackend

StorageBackend(config)

Bases: ABC

Abstract base class for storage backends.

Initialize storage backend with configuration.

Parameters:

Name Type Description Default
config StorageConfig

Storage configuration

required
write abstractmethod
write(data, key, metadata=None)

Write data to storage.

Parameters:

Name Type Description Default
data LazyFrame

Polars LazyFrame to write

required
key str

Storage key (e.g., "BTC-USD", "SPY")

required
metadata dict[str, Any] | None

Optional metadata to store alongside data

None

Returns:

Type Description
Path

Path to written file

read abstractmethod
read(key, start_date=None, end_date=None, columns=None)

Read data from storage.

Parameters:

Name Type Description Default
key str

Storage key

required
start_date datetime | None

Optional start date filter

None
end_date datetime | None

Optional end date filter

None
columns list[str] | None

Optional columns to select

None

Returns:

Type Description
LazyFrame

Polars LazyFrame with requested data

list_keys abstractmethod
list_keys()

List all available keys in storage.

Returns:

Type Description
list[str]

List of storage keys

exists abstractmethod
exists(key)

Check if a key exists in storage.

Parameters:

Name Type Description Default
key str

Storage key to check

required

Returns:

Type Description
bool

True if key exists

delete abstractmethod
delete(key)

Delete data for a key.

Parameters:

Name Type Description Default
key str

Storage key to delete

required

Returns:

Type Description
bool

True if deletion was successful

get_metadata
get_metadata(key)

Get metadata for a key.

Parameters:

Name Type Description Default
key str

Storage key

required

Returns:

Type Description
dict[str, Any] | None

Metadata dict or None

HiveStorage

Hive-partitioned storage with configurable time-based partitioning. Delivers 7x query performance improvement for time-range queries via partition pruning.

from ml4t.data.storage import HiveStorage, StorageConfig

config = StorageConfig(
    base_path="./data",
    partition_granularity="month",  # year, month, day, or hour
)
storage = HiveStorage(config)

# Write data (partitions by timestamp automatically)
storage.write(df, "equities/daily/AAPL")

# Read with partition pruning
from datetime import datetime
lf = storage.read(
    "equities/daily/AAPL",
    start_date=datetime(2024, 6, 1),
    end_date=datetime(2024, 12, 31),
    columns=["timestamp", "close", "volume"],
)
df = lf.collect()

HiveStorage

HiveStorage(config)

Bases: StorageBackend

Hive partitioned storage with configurable time-based partitioning.

This implementation provides: - 7x query performance improvement for time-based queries - Configurable partition granularity (year, month, day, hour) - Atomic writes with temp file pattern - Metadata tracking in JSON manifests - File locking for concurrent access safety - Polars lazy evaluation throughout

Partition Granularity

Configure via StorageConfig.partition_granularity: - "year": Best for daily data (~252 rows/partition) - "month": Best for hourly data (~720 rows/partition) [default] - "day": Best for minute data (~1,440 rows/partition) - "hour": Best for second/tick data (~3,600 rows/partition)

Example

from ml4t.data.storage import HiveStorage, StorageConfig

For minute data, use day-level partitioning

config = StorageConfig(base_path="./data", partition_granularity="day") storage = HiveStorage(config)

Initialize Hive storage backend.

Parameters:

Name Type Description Default
config StorageConfig

Storage configuration

required
write
write(data, key=None, metadata=None)

Write data using Hive partitioning.

Parameters:

Name Type Description Default
data LazyFrame | DataFrame | DataObject

Data to write (DataFrame, LazyFrame, or DataObject)

required
key str | None

Storage key (e.g., "BTC-USD" or "equities/daily/AAPL"). Optional if data is DataObject.

None
metadata dict[str, Any] | None

Optional metadata dict

None

Returns:

Type Description
Path | str

Path to base directory (old API) or storage key string (new DataObject API)

read
read(key, start_date=None, end_date=None, columns=None)

Read data from Hive partitions.

Parameters:

Name Type Description Default
key str

Storage key

required
start_date datetime | None

Optional start date filter

None
end_date datetime | None

Optional end date filter

None
columns list[str] | None

Optional columns to select

None

Returns:

Type Description
LazyFrame

LazyFrame with requested data

list_keys
list_keys()

List all keys in storage.

Returns:

Type Description
list[str]

List of storage keys

exists
exists(key)

Check if key exists.

Parameters:

Name Type Description Default
key str

Storage key

required

Returns:

Type Description
bool

True if key exists

delete
delete(key)

Delete all data for a key.

Parameters:

Name Type Description Default
key str

Storage key

required

Returns:

Type Description
bool

True if successful

get_latest_timestamp
get_latest_timestamp(symbol, provider)

Get the latest timestamp for a symbol from a provider.

Parameters:

Name Type Description Default
symbol str

Symbol identifier

required
provider str

Data provider name

required

Returns:

Type Description
datetime | None

Latest timestamp in the dataset, or None if no data exists

save_chunk
save_chunk(data, symbol, provider, start_time, end_time)

Save an incremental data chunk.

Parameters:

Name Type Description Default
data DataFrame

DataFrame with OHLCV data

required
symbol str

Symbol identifier

required
provider str

Data provider name

required
start_time datetime

Start time of this chunk

required
end_time datetime

End time of this chunk

required

Returns:

Type Description
Path

Path to the saved chunk file

update_combined_file
update_combined_file(data, symbol, provider)

Update the main combined file with new data.

Parameters:

Name Type Description Default
data DataFrame

New data to append

required
symbol str

Symbol identifier

required
provider str

Data provider name

required

Returns:

Type Description
int

Number of new records added (after deduplication)

read_data
read_data(symbol, provider, start_time=None, end_time=None)

Read data for a symbol with optional time filtering.

Parameters:

Name Type Description Default
symbol str

Symbol identifier

required
provider str

Data provider name

required
start_time datetime | None

Optional start time filter

None
end_time datetime | None

Optional end time filter

None

Returns:

Type Description
DataFrame

DataFrame with filtered data

update_metadata
update_metadata(
    symbol, provider, last_update, records_added, chunk_file
)

Update metadata after incremental update.

Parameters:

Name Type Description Default
symbol str

Symbol identifier

required
provider str

Data provider name

required
last_update datetime

Timestamp of this update

required
records_added int

Number of records added

required
chunk_file str

Name of the chunk file saved

required

FlatStorage

Simple single-file-per-key storage. Suitable for smaller datasets or when partition pruning is not beneficial.

from ml4t.data.storage import FlatStorage, StorageConfig

config = StorageConfig(base_path="./data", strategy="flat")
storage = FlatStorage(config)

storage.write(df, "reference/spy")
lf = storage.read("reference/spy")

FlatStorage

FlatStorage(config)

Bases: StorageBackend

Flat file storage without partitioning.

This implementation provides: - Simple single-file storage per key - Atomic writes with temp file pattern - Metadata tracking in JSON manifests - File locking for concurrent access safety - Polars lazy evaluation throughout

Initialize flat storage backend.

Parameters:

Name Type Description Default
config StorageConfig

Storage configuration

required
write
write(data, key, metadata=None)

Write data as a single file.

Parameters:

Name Type Description Default
data LazyFrame | DataFrame

Data to write

required
key str

Storage key (e.g., "BTC-USD")

required
metadata dict[str, Any] | None

Optional metadata

None

Returns:

Type Description
Path

Path to written file

read
read(key, start_date=None, end_date=None, columns=None)

Read data from flat file.

Parameters:

Name Type Description Default
key str

Storage key

required
start_date datetime | None

Optional start date filter

None
end_date datetime | None

Optional end date filter

None
columns list[str] | None

Optional columns to select

None

Returns:

Type Description
LazyFrame

LazyFrame with requested data

list_keys
list_keys()

List all keys in storage.

Returns:

Type Description
list[str]

List of storage keys

exists
exists(key)

Check if key exists.

Parameters:

Name Type Description Default
key str

Storage key

required

Returns:

Type Description
bool

True if key exists

delete
delete(key)

Delete data for a key.

Parameters:

Name Type Description Default
key str

Storage key

required

Returns:

Type Description
bool

True if successful

create_storage

Factory function for creating storage backends from a strategy name.

from ml4t.data.storage import create_storage

storage = create_storage("./data", strategy="hive", partition_granularity="day")

create_storage

create_storage(base_path, strategy='hive', **kwargs)

Create a storage backend with the specified strategy.

Parameters:

Name Type Description Default
base_path str | Path

Base directory for storage

required
strategy str

Storage strategy ("hive" or "flat")

'hive'
**kwargs

Additional configuration options

{}

Returns:

Type Description
StorageBackend

Configured storage backend

Example

storage = create_storage("/data", strategy="hive") storage.write(df.lazy(), "BTC-USD")


Providers

BaseProvider

Abstract base class for all data providers. Composes rate-limiting, circuit-breaker, validation, and HTTP session mixins into a single base.

Concrete providers implement either:

  • _fetch_and_transform_data() for a single-step workflow, or
  • _fetch_raw_data() + _transform_data() for a two-step workflow.
from ml4t.data.providers.base import BaseProvider
import polars as pl

class MyProvider(BaseProvider):
    @property
    def name(self) -> str:
        return "my_provider"

    def _fetch_and_transform_data(self, symbol, start, end, frequency):
        # Fetch from API and return canonical OHLCV DataFrame
        ...

BaseProvider

BaseProvider(
    rate_limit=None,
    session_config=None,
    circuit_breaker_config=None,
)

Bases: RateLimitMixin, CircuitBreakerMixin, ValidationMixin, SessionMixin, ABC

Enhanced base provider composing all mixins.

All providers must return OHLCV data in the canonical schema with columns in standard order: [timestamp, symbol, open, high, low, close, volume].

Each provider must implement either: - _fetch_and_transform_data() for single-step implementation - _fetch_raw_data() + _transform_data() for two-step implementation

Class Variables

DEFAULT_RATE_LIMIT: Default (calls, period_seconds) for rate limiting FREQUENCY_MAP: Mapping of frequency names to provider-specific values CIRCUIT_BREAKER_CONFIG: Circuit breaker failure threshold and reset timeout

Key Contracts
  • Columns always in order: timestamp, symbol, open, high, low, close, volume
  • Timestamps are Datetime type
  • OHLCV values are Float64
  • Symbol is uppercase String
  • Data sorted by timestamp ascending
  • No duplicate timestamps

Initialize base provider with common infrastructure.

Parameters:

Name Type Description Default
rate_limit tuple[int, float] | None

Tuple of (calls, period_seconds) for rate limiting

None
session_config dict[str, Any] | None

HTTP session configuration

None
circuit_breaker_config dict[str, Any] | None

Circuit breaker configuration

None
name abstractmethod property
name

Return the provider name.

fetch_ohlcv
fetch_ohlcv(symbol, start, end, frequency='daily')

Template method for fetching OHLCV data.

This method implements the common workflow: 1. Validate inputs 2. Apply rate limiting 3. Fetch and transform data (provider-specific) 4. Validate and normalize data

Providers can implement either: - _fetch_and_transform_data() for single-step implementation - _fetch_raw_data() + _transform_data() for two-step implementation

Parameters:

Name Type Description Default
symbol str

The symbol to fetch data for

required
start str

Start date in YYYY-MM-DD format (inclusive)

required
end str

End date in YYYY-MM-DD format (see note below)

required
frequency str

Data frequency (daily, minute, etc.)

'daily'

Returns:

Type Description
DataFrame

DataFrame with OHLCV data in canonical schema:

DataFrame

[timestamp, symbol, open, high, low, close, volume]

Note

Date range semantics vary by provider: - Most providers: Both start and end are INCLUSIVE - Yahoo Finance: end is EXCLUSIVE (internally adds 1 day)

fetch_ohlcv_async async
fetch_ohlcv_async(symbol, start, end, frequency='daily')

Async wrapper around fetch_ohlcv using a thread pool.

Providers with native async support should override this method.

capabilities
capabilities()

Return provider capabilities (default implementation).

Override in subclasses to provide accurate capabilities.

close
close()

Clean up resources.

ProviderCapabilities

Frozen dataclass describing what a provider supports (intraday, crypto, forex, futures, authentication requirements, rate limits).

from ml4t.data.providers.protocols import ProviderCapabilities

caps = ProviderCapabilities(
    supports_intraday=True,
    supports_crypto=True,
    requires_api_key=True,
    rate_limit=(120, 60.0),  # 120 calls per 60 seconds
)

ProviderCapabilities dataclass

ProviderCapabilities(
    supports_intraday=False,
    supports_crypto=False,
    supports_forex=False,
    supports_futures=False,
    requires_api_key=False,
    max_history_days=None,
    rate_limit=(60, 60.0),
)

Describes what a provider can do.

Attributes:

Name Type Description
supports_intraday bool

Can fetch minute/hourly data

supports_crypto bool

Handles cryptocurrency symbols

supports_forex bool

Handles forex pairs

supports_futures bool

Handles futures contracts

requires_api_key bool

Needs authentication

max_history_days int | None

Maximum historical data available

rate_limit tuple[int, float]

(calls, period_seconds) tuple

OHLCVProvider (Protocol)

Structural typing protocol for OHLCV providers. Any class implementing name, fetch_ohlcv(), and capabilities() satisfies this protocol without inheriting from BaseProvider.

OHLCVProvider

Bases: Protocol

Protocol for OHLCV data providers.

Any class implementing these methods is considered an OHLCVProvider, regardless of inheritance. This enables duck typing with type safety.

Example

class MyCustomProvider: ... @property ... def name(self) -> str: ... return "custom" ... ... def fetch_ohlcv(self, symbol, start, end, frequency="daily"): ... # Custom implementation ... pass ... ... def capabilities(self) -> ProviderCapabilities: ... return ProviderCapabilities() ... isinstance(MyCustomProvider(), OHLCVProvider) # True

name property
name

Return the provider name (e.g., 'yahoo', 'binance').

fetch_ohlcv
fetch_ohlcv(symbol, start, end, frequency='daily')

Fetch OHLCV data for a symbol.

Parameters:

Name Type Description Default
symbol str

Symbol to fetch (e.g., 'AAPL', 'BTCUSDT')

required
start str

Start date in YYYY-MM-DD format

required
end str

End date in YYYY-MM-DD format

required
frequency str

Data frequency ('daily', 'hourly', 'minute', etc.)

'daily'

Returns:

Type Description
DataFrame

DataFrame with columns: [timestamp, symbol, open, high, low, close, volume]

capabilities
capabilities()

Return provider capabilities.


Configuration

Config

Pydantic model for top-level library configuration. Reads defaults from environment variables (QLDM_DATA_ROOT, QLDM_LOG_LEVEL).

from ml4t.data import Config

# Use defaults
config = Config()

# Override data root
config = Config(data_root="/mnt/fast/market_data", log_level="DEBUG")

Config

Config(**data)

Bases: BaseModel

Main configuration for QLDM.

Initialize config with environment variables.

data_root class-attribute instance-attribute
data_root = Field(default_factory=resolve_data_root)
log_level class-attribute instance-attribute
log_level = 'INFO'
storage class-attribute instance-attribute
storage = Field(default_factory=StorageConfig)
retry class-attribute instance-attribute
retry = Field(default_factory=RetryConfig)
cache class-attribute instance-attribute
cache = Field(default_factory=CacheConfig)
validation class-attribute instance-attribute
validation = Field(
    default_factory=lambda: {
        "enabled": True,
        "strict": False,
    }
)
base_dir property
base_dir

Alias for data_root for backward compatibility.

RetryConfig

Configuration for automatic retry with exponential backoff.

RetryConfig

Bases: BaseModel

Retry configuration.

CacheConfig

Configuration for in-memory caching.

CacheConfig

Bases: BaseModel

Cache configuration.


Exceptions

All exceptions inherit from ML4TDataError, which carries an optional details dictionary for structured error context.

ML4TDataError
├── ProviderError
│   ├── NetworkError
│   │   └── RateLimitError
│   ├── AuthenticationError
│   ├── DataValidationError
│   ├── SymbolNotFoundError
│   └── DataNotAvailableError
├── StorageError
│   └── LockError
├── ConfigurationError
└── CircuitBreakerOpenError

ML4TDataError

ML4TDataError

ML4TDataError(message, details=None)

Bases: Exception

Base exception for all ml4t-data errors.

Initialize ml4t-data error.

Parameters:

Name Type Description Default
message str

Error message

required
details dict[str, Any] | None

Optional dictionary with error details

None

ProviderError

ProviderError

ProviderError(provider, message, details=None)

Bases: ML4TDataError

Base exception for provider-related errors.

Initialize provider error.

Parameters:

Name Type Description Default
provider str

Provider name

required
message str

Error message

required
details dict[str, Any] | None

Optional error details

None

NetworkError

NetworkError

NetworkError(
    provider,
    message="Network error occurred",
    details=None,
    retry_after=None,
)

Bases: ProviderError

Network-related errors (connection, timeout, etc.).

Initialize network error.

Parameters:

Name Type Description Default
provider str

Provider name

required
message str

Error message

'Network error occurred'
details dict[str, Any] | None

Optional error details

None
retry_after float | None

Seconds to wait before retry

None

RateLimitError

RateLimitError

RateLimitError(
    provider, retry_after=None, remaining=None, limit=None
)

Bases: NetworkError

Rate limit exceeded error.

Initialize rate limit error.

Parameters:

Name Type Description Default
provider str

Provider name

required
retry_after float | None

Seconds to wait before retry

None
remaining int | None

Remaining API calls

None
limit int | None

API call limit

None

AuthenticationError

AuthenticationError

AuthenticationError(
    provider, message="Authentication failed", details=None
)

Bases: ProviderError

Authentication/authorization errors.

Initialize authentication error.

DataValidationError

DataValidationError

DataValidationError(
    provider, message, field=None, value=None, details=None
)

Bases: ProviderError

Data validation errors.

Initialize data validation error.

Parameters:

Name Type Description Default
provider str

Provider name

required
message str

Error message

required
field str | None

Field that failed validation

None
value Any | None

Invalid value

None
details dict[str, Any] | None

Optional error details

None

SymbolNotFoundError

SymbolNotFoundError

SymbolNotFoundError(provider, symbol, details=None)

Bases: ProviderError

Symbol not found or invalid.

Initialize symbol not found error.

Parameters:

Name Type Description Default
provider str

Provider name

required
symbol str

The symbol that was not found

required
details dict[str, Any] | None

Optional error details

None

DataNotAvailableError

DataNotAvailableError

DataNotAvailableError(
    provider,
    symbol,
    start=None,
    end=None,
    frequency=None,
    details=None,
)

Bases: ProviderError

Data not available for the requested period.

Initialize data not available error.

Parameters:

Name Type Description Default
provider str

Provider name

required
symbol str

Symbol requested

required
start str | None

Start date

None
end str | None

End date

None
frequency str | None

Data frequency

None
details dict[str, Any] | None

Optional error details

None

StorageError

StorageError

StorageError(message, key=None, details=None)

Bases: ML4TDataError

Storage-related errors.

Initialize storage error.

Parameters:

Name Type Description Default
message str

Error message

required
key str | None

Storage key involved

None
details dict[str, Any] | None

Optional error details

None

LockError

LockError

LockError(key, timeout, details=None)

Bases: StorageError

File locking errors.

Initialize lock error.

Parameters:

Name Type Description Default
key str

Storage key

required
timeout float

Lock timeout that was exceeded

required
details dict[str, Any] | None

Optional error details

None

ConfigurationError

ConfigurationError

ConfigurationError(message, parameter=None, details=None)

Bases: ML4TDataError

Configuration-related errors.

Initialize configuration error.

Parameters:

Name Type Description Default
message str

Error message

required
parameter str | None

Configuration parameter involved

None
details dict[str, Any] | None

Optional error details

None

CircuitBreakerOpenError

CircuitBreakerOpenError

CircuitBreakerOpenError(
    message="Circuit breaker is open",
    failure_count=None,
    details=None,
)

Bases: ML4TDataError

Circuit breaker is open and preventing calls.

Initialize circuit breaker open error.

Parameters:

Name Type Description Default
message str

Error message

'Circuit breaker is open'
failure_count int | None

Number of failures that caused circuit to open

None
details dict[str, Any] | None

Optional error details

None