ML4T Glossary

Key terms and definitions for machine learning, trading, and quantitative finance.

1 Machine Learning

Overfitting
When a model learns noise in the training data rather than the underlying pattern, resulting in poor generalization to new data.
Cross-Validation
A technique for assessing model performance by partitioning data into training and validation sets multiple times.
Walk-Forward Validation
Time-series cross-validation that respects temporal order, training on past data and validating on future data.
Feature Importance
A measure of how much each input variable contributes to model predictions.
Regularization
Techniques that constrain model complexity to prevent overfitting (e.g., L1, L2, dropout).

2 Trading & Finance

Alpha
Excess return above a benchmark, attributed to the skill of the strategy or manager.
Sharpe Ratio
Risk-adjusted return metric: (Return - Risk-free rate) / Standard deviation.
Drawdown
Peak-to-trough decline in portfolio value, measuring downside risk.
Slippage
The difference between expected and actual execution price due to market impact and timing.
Point-in-Time
Data aligned to when it was actually available, preventing lookahead bias.

3 Statistical Testing

p-value
Probability of observing results as extreme as the sample, assuming the null hypothesis is true.
Multiple Testing
The problem of inflated false positive rates when testing many hypotheses simultaneously.
Deflated Sharpe Ratio
Sharpe ratio adjusted for the number of strategies tested, accounting for selection bias.
Information Coefficient (IC)
Correlation between predicted and actual returns, measuring signal quality.

4 Data & Features

Lookahead Bias
Using future information in model training or backtesting that wouldn't be available in real trading.
Survivorship Bias
Bias from excluding failed or delisted securities from historical analysis.
Feature Engineering
Creating predictive inputs from raw data, critical for ML model performance.
Data Leakage
When information from outside the training set improperly influences model building.