Feature Discovery¶
ML4T Engineer provides two complementary discovery APIs: the Feature Registry for programmatic metadata access, and the Feature Catalog for interactive exploration with filtering and search.
If you are arriving from Ch7 10_ml4t_library_ecosystem.py, the
Book Guide shows where discovery fits relative to feature
computation, labeling, and dataset preparation.
Use this page when you are choosing features, validating metadata, or building registry-driven workflows instead of hardcoding indicator names.
Feature Registry¶
The registry is the metadata backbone — every feature registers its name, category, parameters, input requirements, and validation status.
List Features¶
# All 120 features (sorted alphabetically)
all_features = registry.list_all()
# By category
momentum = registry.list_by_category("momentum") # 31 features
# By property
normalized = registry.list_normalized() # 37 ML-ready features
ta_lib = registry.list_ta_lib_compatible() # 59 validated features
Inspect Metadata¶
meta = registry.get("rsi")
meta.name # "rsi"
meta.category # "momentum"
meta.description # "Relative Strength Index"
meta.formula # "RSI = 100 - 100/(1 + RS), RS = AvgGain/AvgLoss"
meta.parameters # {"period": 14}
meta.input_type # "OHLCV"
meta.output_type # "indicator"
meta.normalized # True
meta.value_range # (0, 100)
meta.ta_lib_compatible # True
meta.dependencies # []
meta.references # ["Wilder, 1978"]
meta.tags # ["oscillator", "overbought", "oversold"]
Get Dependencies¶
Some features depend on others (e.g., stochrsi depends on rsi):
compute_features resolves these automatically via topological sort.
Feature Catalog¶
The catalog wraps the registry with higher-level filtering and full-text search:
Filtered Listing¶
# Single filter
feature_catalog.list(category="volatility")
# Multiple filters (AND logic)
feature_catalog.list(
category="momentum",
normalized=True,
ta_lib_compatible=True,
)
# All available filters
feature_catalog.list(
category=None, # str: filter by category
normalized=None, # bool: ML-ready features only
ta_lib_compatible=None, # bool: TA-Lib validated only
tags=None, # list[str]: features matching any tag
input_type=None, # str: "OHLCV", "close", "returns", etc.
output_type=None, # str: "indicator", "bands", etc.
has_dependencies=None, # bool: features with/without dependencies
limit=None, # int: max results
)
Full-Text Search¶
Search across feature names, descriptions, tags, and formulas:
results = feature_catalog.search("volatility estimator")
# [("parkinson_volatility", 0.65), ("garman_klass_volatility", 0.45), ("rogers_satchell_volatility", 0.45), ...]
results = feature_catalog.search("trend strength")
# [("trend_intensity_index", 0.92), ("adx", 0.78), ...]
results = feature_catalog.search("spread")
# [("realized_spread", 1.30), ("roll_spread_estimator", 1.30)]
Returns a list of (feature_name, relevance_score) tuples, sorted by relevance.
Describe¶
Get a dict summary of any feature:
info = feature_catalog.describe("yang_zhang_volatility")
# {
# "name": "yang_zhang_volatility",
# "category": "volatility",
# "description": "Yang-Zhang Volatility - combines overnight and intraday volatility",
# "parameters": {},
# "normalized": False,
# "ta_lib_compatible": False,
# "input_type": "close",
# "value_range": None,
# "dependencies": [],
# "tags": [],
# }
Browse Categories and Tags¶
feature_catalog.categories()
# ['cross_asset', 'math', 'microstructure', 'ml', 'momentum',
# 'price_transform', 'regime', 'risk', 'statistics', 'trend',
# 'volatility', 'volume']
feature_catalog.tags()
# ['efficient', 'illiquidity', 'ma', 'microstructure', 'normalized',
# 'ohlc', 'oscillator', 'overbought', 'oversold', 'spread', ...]
Metadata Fields Reference¶
| Field | Type | Description |
|---|---|---|
name |
str |
Unique feature identifier |
category |
str |
Feature category |
description |
str |
One-line description |
formula |
str |
Mathematical formula |
parameters |
dict[str, Any] |
Default parameters |
input_type |
str |
Required input columns ("OHLCV", "close", etc.) |
output_type |
str |
Output type ("indicator", "bands", etc.) |
normalized |
bool \| None |
Whether output is bounded |
value_range |
tuple[float, float] \| None |
Output range if normalized |
ta_lib_compatible |
bool |
Validated against TA-Lib at 1e-6 |
dependencies |
list[str] |
Other features this depends on |
references |
list[str] |
Academic references |
tags |
list[str] |
Searchable tags |
lookback |
Callable |
Function returning minimum lookback period |
See It In The Book¶
- Ch7
10_ml4t_library_ecosystem.pyfor registry inspection and catalog search - Book Guide for how discovery connects to feature computation, labeling, and dataset preparation
Next Steps¶
- Read Features to turn chosen metadata into actual feature pipelines.
- Read ML Readiness if feature selection depends on normalization.
- Use the API Reference for exact object locations.