ML4T Backtest
ML4T Backtest Documentation
Event-driven backtesting with realistic execution
Skip to content

Results & Analysis

Engine.run() returns a BacktestResult containing trades, equity curve, fills, portfolio state, and computed metrics. Everything is accessible as Python objects, Polars DataFrames, or Parquet files.

This applies to both classic OHLCV backtests and quote-aware backtests. When you run with bid/ask-aware execution or marking, the result surface preserves the quote context used to produce fills and trade summaries.

For reproducibility, BacktestResult also exposes:

  • result.config.to_dict() for the fully resolved replayable config payload
  • result.to_spec_dict() for a richer runtime snapshot including library version and realized window
  • result.to_predictions_dataframe() for the raw prediction/input surface passed into the backtest, when available
  • result.to_parquet(...), which writes config.yaml, spec.yaml, and predictions.parquet when available

Metrics

result = engine.run()
m = result.metrics

# Returns
print(f"Total Return:  {m['total_return_pct']:.1f}%")
print(f"CAGR:          {m['cagr']:.2%}")
print(f"Volatility:    {m['volatility']:.2%}")

# Risk
print(f"Max Drawdown:  {m['max_drawdown_pct']:.1f}%")
print(f"Sharpe:        {m['sharpe']:.2f}")
print(f"Sortino:       {m['sortino']:.2f}")
print(f"Calmar:        {m['calmar']:.2f}")

# Trades
print(f"Trades:        {m['num_trades']}")
print(f"Fills:         {m['num_fills']}")
print(f"Rebalances:    {m['num_rebalance_events']}")
print(f"Symbols:       {m['unique_symbols_traded']}")
print(f"Win Rate:      {m['win_rate']:.1%}")
print(f"Profit Factor: {m['profit_factor']:.2f}")
print(f"Expectancy:    ${m['expectancy']:.2f}")
print(f"Avg Win:       ${m['avg_win']:.2f}")
print(f"Avg Loss:      ${m['avg_loss']:.2f}")

# Per-trade returns (percentage-based, direction-aware)
print(f"Avg Trade:     {m['avg_trade']:.2%}")
print(f"Avg Win:       {m['avg_win']:.2%}")
print(f"Avg Loss:      {m['avg_loss']:.2%}")
print(f"Best Trade:    {m['largest_win']:.2%}")
print(f"Worst Trade:   {m['largest_loss']:.2%}")
print(f"Payoff Ratio:  {m['payoff_ratio']:.2f}")

# Costs
print(f"Commission:    ${m['total_commission']:.2f}")
print(f"Slippage:      ${m['total_slippage']:.2f}")
print(f"Total Costs:   ${m['total_costs']:.2f}")
print(f"Avg Cost Drag: {m['avg_cost_drag']:.4%}")
print(f"Filled Notional: ${m['total_filled_notional']:,.2f}")
print(f"Avg Turnover:    {m['avg_turnover']:.2%}")
print(f"Max Turnover:    {m['max_turnover']:.2%}")
print(f"Avg Open Pos:    {m['avg_open_positions']:.2f}")
print(f"Max Open Pos:    {m['max_open_positions']}")

# Gross vs Net
print(f"Gross P&L:     ${m['total_gross_pnl']:.2f}")
print(f"Gross PF:      {m['gross_profit_factor']:.2f}")
print(f"Net PF:        {m['profit_factor']:.2f}")

Available Metrics

Metric Description
initial_cash Starting cash
final_value Final portfolio value
total_return Total return as decimal
total_return_pct Total return as percentage
cagr Compound annual growth rate
volatility Annualized volatility
max_drawdown Maximum drawdown (positive number)
max_drawdown_pct Maximum drawdown as percentage
sharpe Sharpe ratio
sortino Sortino ratio
calmar Calmar ratio
num_trades Total completed trades
num_fills Total execution events
num_rebalance_events Unique timestamps with at least one fill
unique_symbols_traded Number of symbols with at least one fill
winning_trades Number of winning trades
losing_trades Number of losing trades
win_rate Win rate (0 to 1)
profit_factor Net profit factor (winning P&L / losing P&L)
expectancy Expected return per trade (decimal)
avg_trade Average trade return (decimal)
avg_win Average winning trade return (decimal)
avg_loss Average losing trade return (decimal, negative)
largest_win Best single trade return (decimal)
largest_loss Worst single trade return (decimal, negative)
payoff_ratio avg_win / |avg_loss| (size-normalized reward-to-risk)
total_commission Total commission paid
total_slippage Total slippage cost in dollars (entry + exit)
total_filled_notional Sum of absolute filled notional across all fills
avg_turnover Mean per-bar one-way turnover from fills
max_turnover Maximum per-bar one-way turnover from fills
avg_open_positions Mean number of open positions across bars
max_open_positions Maximum number of open positions across bars
total_gross_pnl Total P&L from price moves only (before costs)
total_costs Total transaction costs (commission + slippage)
avg_cost_drag Average cost as fraction of trade notional
gross_profit_factor Profit factor from raw price moves (isolates edge from costs)
skipped_bars Bars skipped by calendar filter

Reporting Model

BacktestResult now exposes three distinct raw reporting surfaces:

  • trades: flat-to-flat lifecycle summaries
  • fills: execution blotter rows
  • portfolio_state: end-of-bar portfolio snapshots

For rebalancing strategies, num_trades is not the right proxy for trading activity. Use num_fills, num_rebalance_events, and total_filled_notional instead.

Turnover uses a one-way execution-based definition:

turnover_t = filled_notional_at_timestamp / equity_at_timestamp

Bars with no fills contribute 0. This means:

  • buying a fully cash portfolio into a fully invested book is turnover 1.0
  • selling a fully invested book back to cash is turnover 1.0
  • fully rotating one full book into another is turnover 2.0

Cost Decomposition

Every trade carries a full cost breakdown, letting you separate strategy edge from execution costs:

for trade in result.trades:
    print(f"{trade.symbol}: gross={trade.gross_pnl:+.2f}, "
          f"net={trade.pnl:+.2f}, drag={trade.cost_drag:.4%}")
Property Description
trade.gross_pnl Price-move P&L: (exit - entry) * qty * multiplier
trade.pnl Net P&L after all costs
trade.gross_return Direction-aware gross return (same as pnl_percent)
trade.net_return Direction-aware net return including fees
trade.total_slippage_cost Entry + exit slippage in dollars
trade.cost_drag Total cost as fraction of notional
trade.fees Total commission (entry + exit)
trade.entry_slippage Per-unit slippage on entry
trade.exit_slippage Per-unit slippage on exit
trade.multiplier Contract multiplier (1.0 for equities, 50.0 for ES futures)

Trade records also summarize nullable quote context for the entry and exit:

Property Description
trade.entry_quote_mid_price / trade.exit_quote_mid_price Quote midpoint at fill time
trade.entry_bid_price / trade.exit_bid_price Best bid at fill time
trade.entry_ask_price / trade.exit_ask_price Best ask at fill time
trade.entry_spread / trade.exit_spread Bid/ask spread at fill time
trade.entry_available_size / trade.exit_available_size Side-aware available quote size

pnl_percent is direction-aware: positive means profitable for both long and short trades.

Quote-aware backtests therefore leave an explicit audit trail:

  • fills record the reference price, bid, ask, midpoint, spread, and available size
  • trades summarize entry/exit quote context
  • portfolio state reflects the configured mark_price

Trade Analyzer

result.trade_analyzer provides aggregate statistics on closed trades:

ta = result.trade_analyzer

# Standard metrics
print(f"Win Rate:           {ta.win_rate:.1%}")
print(f"Profit Factor:      {ta.profit_factor:.2f}")
print(f"Avg MFE:            {ta.avg_mfe:.4f}")
print(f"MFE Capture:        {ta.mfe_capture_ratio:.2f}")

# Cost decomposition
print(f"Gross P&L:          ${ta.total_gross_pnl:.2f}")
print(f"Net Profit:         ${ta.net_profit:.2f}")
print(f"Total Costs:        ${ta.total_costs:.2f}")
print(f"Avg Cost Drag:      {ta.avg_cost_drag:.4%}")
print(f"Gross Profit Factor:{ta.gross_profit_factor:.2f}")

# Filter by side
long_stats = ta.by_side("long")
short_stats = ta.by_side("short")
print(f"Long win rate: {long_stats.win_rate:.1%}")
print(f"Short win rate: {short_stats.win_rate:.1%}")

# Export all stats
stats_dict = ta.to_dict()

Trades DataFrame

trades_df = result.to_trades_dataframe()
print(trades_df.head())

Returns a Polars DataFrame with columns:

Column Type Description
symbol String Asset identifier
entry_time Datetime Entry timestamp
exit_time Datetime Exit timestamp
entry_price Float Entry fill price
exit_price Float Exit fill price
quantity Float Position size (negative for shorts)
direction String "long" or "short"
pnl Float Net P&L after costs
pnl_percent Float Direction-aware percentage return
bars_held Int Holding period
fees Float Total commission
exit_slippage Float Exit slippage
mfe Float Maximum favorable excursion
mae Float Maximum adverse excursion
entry_slippage Float Per-unit slippage on entry
multiplier Float Contract multiplier (futures)
entry_quote_mid_price Float Entry quote midpoint
entry_bid_price Float Entry best bid
entry_ask_price Float Entry best ask
entry_spread Float Entry spread
entry_available_size Float Entry-side available size
exit_quote_mid_price Float Exit quote midpoint
exit_bid_price Float Exit best bid
exit_ask_price Float Exit best ask
exit_spread Float Exit spread
exit_available_size Float Exit-side available size
gross_pnl Float Price-move P&L before fees
net_return Float Direction-aware net return including fees
total_slippage_cost Float Entry + exit slippage in dollars
cost_drag Float Total cost as fraction of notional
exit_reason String Why the trade exited
status String "closed" or "open"

Open positions at the end of the backtest are included with status="open" and mark-to-market values.

Equity DataFrame

equity_df = result.to_equity_dataframe()
print(equity_df.head())

Returns a Polars DataFrame with columns:

Column Type Description
timestamp Datetime Bar timestamp
equity Float Portfolio value
return Float Bar-to-bar return
cumulative_return Float Cumulative return from start
drawdown Float Current drawdown from HWM
high_water_mark Float Running maximum equity

Portfolio State DataFrame

portfolio_df = result.to_portfolio_state_dataframe()
print(portfolio_df.head())

Returns a Polars DataFrame with columns:

Column Type Description
timestamp Datetime Bar timestamp
equity Float Portfolio equity after bar processing
cash Float Cash balance
gross_exposure Float Sum of absolute marked position values
net_exposure Float Signed sum of marked position values
open_positions Int Number of open positions

This is the right primitive for downstream diagnostics such as:

  • average invested fraction
  • gross and net exposure analysis
  • time in market
  • occupancy and utilization metrics

Predictions DataFrame

If you passed signals_df into DataFeed or run_backtest(...), the raw input is available on the result:

predictions_df = result.to_predictions_dataframe()
print(predictions_df.head())

This returns the original Polars DataFrame without reinterpretation. It is meant for downstream diagnostics and attribution, for example:

  • joining predictions to fills or trades
  • evaluating score distributions for winning vs losing trades
  • comparing model inputs to realized execution and PnL

The important convention is that this surface is treated as the raw model/input surface, not as a guaranteed post-mapping trading signal surface.

Fills

Access every individual order fill:

for fill in result.fills:
    print(f"{fill.asset}: {fill.quantity} @ ${fill.price:.2f}")
    print(f"  Type: {fill.order_type}")
    print(f"  Commission: ${fill.commission:.2f}")
    print(f"  Slippage: ${fill.slippage:.4f}")

Or convert them directly to a Polars DataFrame:

fills_df = result.to_fills_dataframe()
print(fills_df.select(["asset", "side", "price", "price_source", "bid_price", "ask_price"]))

Fill objects carry order-type metadata for audit:

Field Description
fill.order_type "market", "limit", or "stop"
fill.limit_price Limit price (for limit orders)
fill.stop_price Stop price (for stop orders)
fill.price Actual fill price
fill.commission Commission charged
fill.slippage Slippage applied
fill.price_source Configured source used for the fill
fill.reference_price Feed reference price (bar["price"])
fill.quote_mid_price Quote midpoint at fill time
fill.bid_price / fill.ask_price Best bid / ask
fill.spread Bid-ask spread
fill.bid_size / fill.ask_size Quote sizes
fill.available_size Side-aware size used for the fill context

For quote-aware backtests, fills.parquet is the first place to look when you want to verify whether a result difference came from:

  • quote-side execution
  • synthetic slippage
  • commission
  • the configured mark source

Dictionary Output

For backward compatibility:

result_dict = result.to_dict()
# Same structure as result.metrics

Parquet Export

Save results for later analysis or integration with ml4t-diagnostic:

# Export all result components to Parquet / JSON / YAML
result.to_parquet("./results/my_backtest")
# Creates:
#   trades.parquet
#   fills.parquet
#   predictions.parquet  # if raw prediction inputs were supplied
#   equity.parquet
#   portfolio_state.parquet
#   daily_pnl.parquet
#   metrics.json
#   config.yaml  # when config is attached

# Reload later
from ml4t.backtest.result import BacktestResult
result = BacktestResult.from_parquet("./results/my_backtest")

Integration with ml4t-diagnostic

The simplest way to bridge backtest results into ml4t-diagnostic is portfolio_analysis_from_result():

from ml4t.backtest import Engine
from ml4t.diagnostic.integration import portfolio_analysis_from_result

result = engine.run()

# One-liner bridge to ml4t-diagnostic
analysis = portfolio_analysis_from_result(result, calendar="NYSE")

# Now use PortfolioAnalysis methods
print(f"Sharpe: {analysis.sharpe_ratio():.2f}")
print(f"Max DD: {analysis.max_drawdown():.2%}")
monthly = analysis.compute_monthly_returns()

The helper extracts daily returns via to_daily_pnl() and sets periods_per_year from the calendar (252 for NYSE, 365 for crypto, etc.). If no calendar is passed, it uses the config's calendar.

For richer diagnostics, pass these alongside returns:

  • result.to_trades_dataframe()
  • result.to_fills_dataframe()
  • result.to_portfolio_state_dataframe()
# Crypto backtest
analysis = portfolio_analysis_from_result(result, calendar="crypto")

# With benchmark
analysis = portfolio_analysis_from_result(
    result,
    calendar="NYSE",
    benchmark=spy_returns,  # numpy array or Polars Series
)

# Gross vs net comparison
analysis_gross = portfolio_analysis_from_result(results_gross, calendar="crypto")
analysis_net = portfolio_analysis_from_result(results_net, calendar="crypto")

Requires ml4t-diagnostic

Install with pip install ml4t-diagnostic. The import is deferred so ml4t-backtest works standalone without ml4t-diagnostic installed.

Trade Records

Convert trades to TradeRecord format for the diagnostic library:

# Bridge to ml4t-diagnostic
trade_records = result.to_trade_records()

# Or use the bridge function directly
from ml4t.backtest.analytics.bridge import to_trade_records
records = to_trade_records(result.trades)

The bridge exports all cost decomposition fields (gross_pnl, net_return, total_slippage_cost, cost_drag) for diagnostic analysis.

Full Tearsheet

Pass all result data for the richest tearsheet (up to 24 sections):

from ml4t.diagnostic.visualization.backtest import generate_backtest_tearsheet

html = generate_backtest_tearsheet(
    trades=result.to_trades_dataframe(),
    returns=analysis.returns,
    equity_curve=result.to_equity_dataframe(),
    metrics=result.metrics,
    template="full",
    title="My Strategy — Full Report",
    output_path="tearsheet.html",
)

Metrics Keys That Enable Tearsheet Sections

The metrics dict controls which tearsheet sections render. Sections gracefully degrade when keys are missing.

Section Required Metrics Keys
Executive Summary sharpe_ratio, max_drawdown, win_rate, profit_factor, n_trades, cagr, volatility, expectancy
Cost Attribution gross_pnl, commission, slippage
Statistical Validity (DSR) dsr_probability, dsr_significant, min_trl, current_trl, trl_sufficient
RAS Adjustment ras_adjusted_ic, ras_significant, original_ic, rademacher_complexity
Confidence Intervals sharpe_ratio, sharpe_ratio_lower_95, sharpe_ratio_upper_95 (similarly for other metrics)
Haircut Sharpe sharpe, n_periods or n_observations
Expected Max Sharpe expected_max_sharpe

Sections that depend only on trades (trade analysis, MFE/MAE, exit reasons) or returns (drawdown, monthly heatmap, rolling Sharpe) require no special metrics keys.

Config Preservation

The config used for the backtest is preserved in the result:

print(result.config.describe())
print(result.config.preset_name)

See It in Action

The Machine Learning for Trading book uses BacktestResult in every case study:

  • Ch16 / NB05 (performance_reporting) — portfolio_analysis_from_result(), MFE/MAE analysis, gross vs net comparison, full 24-section tearsheet
  • Ch16 case studies — all cases save trade artifacts via to_parquet() and pass trades/metrics/equity to tearsheet generation
  • Ch16 / NB06 (sharpe_ratio_inference) — statistical inference on backtest results

Next Steps

  • Book Guide -- notebook and case-study map for reporting workflows
  • Quickstart -- end-to-end examples
  • Profiles -- compare results across framework profiles