Results & Analysis¶
Engine.run() returns a BacktestResult containing trades, equity curve, fills,
portfolio state, and computed metrics. Everything is accessible as Python objects,
Polars DataFrames, or Parquet files.
This applies to both classic OHLCV backtests and quote-aware backtests. When you run with bid/ask-aware execution or marking, the result surface preserves the quote context used to produce fills and trade summaries.
For reproducibility, BacktestResult also exposes:
result.config.to_dict()for the fully resolved replayable config payloadresult.to_spec_dict()for a richer runtime snapshot including library version and realized windowresult.to_predictions_dataframe()for the raw prediction/input surface passed into the backtest, when availableresult.to_parquet(...), which writesconfig.yaml,spec.yaml, andpredictions.parquetwhen available
Metrics¶
result = engine.run()
m = result.metrics
# Returns
print(f"Total Return: {m['total_return_pct']:.1f}%")
print(f"CAGR: {m['cagr']:.2%}")
print(f"Volatility: {m['volatility']:.2%}")
# Risk
print(f"Max Drawdown: {m['max_drawdown_pct']:.1f}%")
print(f"Sharpe: {m['sharpe']:.2f}")
print(f"Sortino: {m['sortino']:.2f}")
print(f"Calmar: {m['calmar']:.2f}")
# Trades
print(f"Trades: {m['num_trades']}")
print(f"Fills: {m['num_fills']}")
print(f"Rebalances: {m['num_rebalance_events']}")
print(f"Symbols: {m['unique_symbols_traded']}")
print(f"Win Rate: {m['win_rate']:.1%}")
print(f"Profit Factor: {m['profit_factor']:.2f}")
print(f"Expectancy: ${m['expectancy']:.2f}")
print(f"Avg Win: ${m['avg_win']:.2f}")
print(f"Avg Loss: ${m['avg_loss']:.2f}")
# Per-trade returns (percentage-based, direction-aware)
print(f"Avg Trade: {m['avg_trade']:.2%}")
print(f"Avg Win: {m['avg_win']:.2%}")
print(f"Avg Loss: {m['avg_loss']:.2%}")
print(f"Best Trade: {m['largest_win']:.2%}")
print(f"Worst Trade: {m['largest_loss']:.2%}")
print(f"Payoff Ratio: {m['payoff_ratio']:.2f}")
# Costs
print(f"Commission: ${m['total_commission']:.2f}")
print(f"Slippage: ${m['total_slippage']:.2f}")
print(f"Total Costs: ${m['total_costs']:.2f}")
print(f"Avg Cost Drag: {m['avg_cost_drag']:.4%}")
print(f"Filled Notional: ${m['total_filled_notional']:,.2f}")
print(f"Avg Turnover: {m['avg_turnover']:.2%}")
print(f"Max Turnover: {m['max_turnover']:.2%}")
print(f"Avg Open Pos: {m['avg_open_positions']:.2f}")
print(f"Max Open Pos: {m['max_open_positions']}")
# Gross vs Net
print(f"Gross P&L: ${m['total_gross_pnl']:.2f}")
print(f"Gross PF: {m['gross_profit_factor']:.2f}")
print(f"Net PF: {m['profit_factor']:.2f}")
Available Metrics¶
| Metric | Description |
|---|---|
initial_cash |
Starting cash |
final_value |
Final portfolio value |
total_return |
Total return as decimal |
total_return_pct |
Total return as percentage |
cagr |
Compound annual growth rate |
volatility |
Annualized volatility |
max_drawdown |
Maximum drawdown (positive number) |
max_drawdown_pct |
Maximum drawdown as percentage |
sharpe |
Sharpe ratio |
sortino |
Sortino ratio |
calmar |
Calmar ratio |
num_trades |
Total completed trades |
num_fills |
Total execution events |
num_rebalance_events |
Unique timestamps with at least one fill |
unique_symbols_traded |
Number of symbols with at least one fill |
winning_trades |
Number of winning trades |
losing_trades |
Number of losing trades |
win_rate |
Win rate (0 to 1) |
profit_factor |
Net profit factor (winning P&L / losing P&L) |
expectancy |
Expected return per trade (decimal) |
avg_trade |
Average trade return (decimal) |
avg_win |
Average winning trade return (decimal) |
avg_loss |
Average losing trade return (decimal, negative) |
largest_win |
Best single trade return (decimal) |
largest_loss |
Worst single trade return (decimal, negative) |
payoff_ratio |
avg_win / |avg_loss| (size-normalized reward-to-risk) |
total_commission |
Total commission paid |
total_slippage |
Total slippage cost in dollars (entry + exit) |
total_filled_notional |
Sum of absolute filled notional across all fills |
avg_turnover |
Mean per-bar one-way turnover from fills |
max_turnover |
Maximum per-bar one-way turnover from fills |
avg_open_positions |
Mean number of open positions across bars |
max_open_positions |
Maximum number of open positions across bars |
total_gross_pnl |
Total P&L from price moves only (before costs) |
total_costs |
Total transaction costs (commission + slippage) |
avg_cost_drag |
Average cost as fraction of trade notional |
gross_profit_factor |
Profit factor from raw price moves (isolates edge from costs) |
skipped_bars |
Bars skipped by calendar filter |
Reporting Model¶
BacktestResult now exposes three distinct raw reporting surfaces:
trades: flat-to-flat lifecycle summariesfills: execution blotter rowsportfolio_state: end-of-bar portfolio snapshots
For rebalancing strategies, num_trades is not the right proxy for trading activity.
Use num_fills, num_rebalance_events, and total_filled_notional instead.
Turnover uses a one-way execution-based definition:
Bars with no fills contribute 0. This means:
- buying a fully cash portfolio into a fully invested book is turnover
1.0 - selling a fully invested book back to cash is turnover
1.0 - fully rotating one full book into another is turnover
2.0
Cost Decomposition¶
Every trade carries a full cost breakdown, letting you separate strategy edge from execution costs:
for trade in result.trades:
print(f"{trade.symbol}: gross={trade.gross_pnl:+.2f}, "
f"net={trade.pnl:+.2f}, drag={trade.cost_drag:.4%}")
| Property | Description |
|---|---|
trade.gross_pnl |
Price-move P&L: (exit - entry) * qty * multiplier |
trade.pnl |
Net P&L after all costs |
trade.gross_return |
Direction-aware gross return (same as pnl_percent) |
trade.net_return |
Direction-aware net return including fees |
trade.total_slippage_cost |
Entry + exit slippage in dollars |
trade.cost_drag |
Total cost as fraction of notional |
trade.fees |
Total commission (entry + exit) |
trade.entry_slippage |
Per-unit slippage on entry |
trade.exit_slippage |
Per-unit slippage on exit |
trade.multiplier |
Contract multiplier (1.0 for equities, 50.0 for ES futures) |
Trade records also summarize nullable quote context for the entry and exit:
| Property | Description |
|---|---|
trade.entry_quote_mid_price / trade.exit_quote_mid_price |
Quote midpoint at fill time |
trade.entry_bid_price / trade.exit_bid_price |
Best bid at fill time |
trade.entry_ask_price / trade.exit_ask_price |
Best ask at fill time |
trade.entry_spread / trade.exit_spread |
Bid/ask spread at fill time |
trade.entry_available_size / trade.exit_available_size |
Side-aware available quote size |
pnl_percent is direction-aware: positive means profitable for both long and short trades.
Quote-aware backtests therefore leave an explicit audit trail:
- fills record the reference price, bid, ask, midpoint, spread, and available size
- trades summarize entry/exit quote context
- portfolio state reflects the configured
mark_price
Trade Analyzer¶
result.trade_analyzer provides aggregate statistics on closed trades:
ta = result.trade_analyzer
# Standard metrics
print(f"Win Rate: {ta.win_rate:.1%}")
print(f"Profit Factor: {ta.profit_factor:.2f}")
print(f"Avg MFE: {ta.avg_mfe:.4f}")
print(f"MFE Capture: {ta.mfe_capture_ratio:.2f}")
# Cost decomposition
print(f"Gross P&L: ${ta.total_gross_pnl:.2f}")
print(f"Net Profit: ${ta.net_profit:.2f}")
print(f"Total Costs: ${ta.total_costs:.2f}")
print(f"Avg Cost Drag: {ta.avg_cost_drag:.4%}")
print(f"Gross Profit Factor:{ta.gross_profit_factor:.2f}")
# Filter by side
long_stats = ta.by_side("long")
short_stats = ta.by_side("short")
print(f"Long win rate: {long_stats.win_rate:.1%}")
print(f"Short win rate: {short_stats.win_rate:.1%}")
# Export all stats
stats_dict = ta.to_dict()
Trades DataFrame¶
Returns a Polars DataFrame with columns:
| Column | Type | Description |
|---|---|---|
symbol |
String | Asset identifier |
entry_time |
Datetime | Entry timestamp |
exit_time |
Datetime | Exit timestamp |
entry_price |
Float | Entry fill price |
exit_price |
Float | Exit fill price |
quantity |
Float | Position size (negative for shorts) |
direction |
String | "long" or "short" |
pnl |
Float | Net P&L after costs |
pnl_percent |
Float | Direction-aware percentage return |
bars_held |
Int | Holding period |
fees |
Float | Total commission |
exit_slippage |
Float | Exit slippage |
mfe |
Float | Maximum favorable excursion |
mae |
Float | Maximum adverse excursion |
entry_slippage |
Float | Per-unit slippage on entry |
multiplier |
Float | Contract multiplier (futures) |
entry_quote_mid_price |
Float | Entry quote midpoint |
entry_bid_price |
Float | Entry best bid |
entry_ask_price |
Float | Entry best ask |
entry_spread |
Float | Entry spread |
entry_available_size |
Float | Entry-side available size |
exit_quote_mid_price |
Float | Exit quote midpoint |
exit_bid_price |
Float | Exit best bid |
exit_ask_price |
Float | Exit best ask |
exit_spread |
Float | Exit spread |
exit_available_size |
Float | Exit-side available size |
gross_pnl |
Float | Price-move P&L before fees |
net_return |
Float | Direction-aware net return including fees |
total_slippage_cost |
Float | Entry + exit slippage in dollars |
cost_drag |
Float | Total cost as fraction of notional |
exit_reason |
String | Why the trade exited |
status |
String | "closed" or "open" |
Open positions at the end of the backtest are included with status="open" and mark-to-market values.
Equity DataFrame¶
Returns a Polars DataFrame with columns:
| Column | Type | Description |
|---|---|---|
timestamp |
Datetime | Bar timestamp |
equity |
Float | Portfolio value |
return |
Float | Bar-to-bar return |
cumulative_return |
Float | Cumulative return from start |
drawdown |
Float | Current drawdown from HWM |
high_water_mark |
Float | Running maximum equity |
Portfolio State DataFrame¶
Returns a Polars DataFrame with columns:
| Column | Type | Description |
|---|---|---|
timestamp |
Datetime | Bar timestamp |
equity |
Float | Portfolio equity after bar processing |
cash |
Float | Cash balance |
gross_exposure |
Float | Sum of absolute marked position values |
net_exposure |
Float | Signed sum of marked position values |
open_positions |
Int | Number of open positions |
This is the right primitive for downstream diagnostics such as:
- average invested fraction
- gross and net exposure analysis
- time in market
- occupancy and utilization metrics
Predictions DataFrame¶
If you passed signals_df into DataFeed or run_backtest(...), the raw input
is available on the result:
This returns the original Polars DataFrame without reinterpretation. It is meant for downstream diagnostics and attribution, for example:
- joining predictions to fills or trades
- evaluating score distributions for winning vs losing trades
- comparing model inputs to realized execution and PnL
The important convention is that this surface is treated as the raw model/input surface, not as a guaranteed post-mapping trading signal surface.
Fills¶
Access every individual order fill:
for fill in result.fills:
print(f"{fill.asset}: {fill.quantity} @ ${fill.price:.2f}")
print(f" Type: {fill.order_type}")
print(f" Commission: ${fill.commission:.2f}")
print(f" Slippage: ${fill.slippage:.4f}")
Or convert them directly to a Polars DataFrame:
fills_df = result.to_fills_dataframe()
print(fills_df.select(["asset", "side", "price", "price_source", "bid_price", "ask_price"]))
Fill objects carry order-type metadata for audit:
| Field | Description |
|---|---|
fill.order_type |
"market", "limit", or "stop" |
fill.limit_price |
Limit price (for limit orders) |
fill.stop_price |
Stop price (for stop orders) |
fill.price |
Actual fill price |
fill.commission |
Commission charged |
fill.slippage |
Slippage applied |
fill.price_source |
Configured source used for the fill |
fill.reference_price |
Feed reference price (bar["price"]) |
fill.quote_mid_price |
Quote midpoint at fill time |
fill.bid_price / fill.ask_price |
Best bid / ask |
fill.spread |
Bid-ask spread |
fill.bid_size / fill.ask_size |
Quote sizes |
fill.available_size |
Side-aware size used for the fill context |
For quote-aware backtests, fills.parquet is the first place to look when you want
to verify whether a result difference came from:
- quote-side execution
- synthetic slippage
- commission
- the configured mark source
Dictionary Output¶
For backward compatibility:
Parquet Export¶
Save results for later analysis or integration with ml4t-diagnostic:
# Export all result components to Parquet / JSON / YAML
result.to_parquet("./results/my_backtest")
# Creates:
# trades.parquet
# fills.parquet
# predictions.parquet # if raw prediction inputs were supplied
# equity.parquet
# portfolio_state.parquet
# daily_pnl.parquet
# metrics.json
# config.yaml # when config is attached
# Reload later
from ml4t.backtest.result import BacktestResult
result = BacktestResult.from_parquet("./results/my_backtest")
Integration with ml4t-diagnostic¶
Portfolio Analysis (Recommended)¶
The simplest way to bridge backtest results into ml4t-diagnostic is
portfolio_analysis_from_result():
from ml4t.backtest import Engine
from ml4t.diagnostic.integration import portfolio_analysis_from_result
result = engine.run()
# One-liner bridge to ml4t-diagnostic
analysis = portfolio_analysis_from_result(result, calendar="NYSE")
# Now use PortfolioAnalysis methods
print(f"Sharpe: {analysis.sharpe_ratio():.2f}")
print(f"Max DD: {analysis.max_drawdown():.2%}")
monthly = analysis.compute_monthly_returns()
The helper extracts daily returns via to_daily_pnl() and sets
periods_per_year from the calendar (252 for NYSE, 365 for crypto, etc.).
If no calendar is passed, it uses the config's calendar.
For richer diagnostics, pass these alongside returns:
result.to_trades_dataframe()result.to_fills_dataframe()result.to_portfolio_state_dataframe()
# Crypto backtest
analysis = portfolio_analysis_from_result(result, calendar="crypto")
# With benchmark
analysis = portfolio_analysis_from_result(
result,
calendar="NYSE",
benchmark=spy_returns, # numpy array or Polars Series
)
# Gross vs net comparison
analysis_gross = portfolio_analysis_from_result(results_gross, calendar="crypto")
analysis_net = portfolio_analysis_from_result(results_net, calendar="crypto")
Requires ml4t-diagnostic
Install with pip install ml4t-diagnostic. The import is deferred so ml4t-backtest
works standalone without ml4t-diagnostic installed.
Trade Records¶
Convert trades to TradeRecord format for the diagnostic library:
# Bridge to ml4t-diagnostic
trade_records = result.to_trade_records()
# Or use the bridge function directly
from ml4t.backtest.analytics.bridge import to_trade_records
records = to_trade_records(result.trades)
The bridge exports all cost decomposition fields (gross_pnl, net_return, total_slippage_cost, cost_drag) for diagnostic analysis.
Full Tearsheet¶
Pass all result data for the richest tearsheet (up to 24 sections):
from ml4t.diagnostic.visualization.backtest import generate_backtest_tearsheet
html = generate_backtest_tearsheet(
trades=result.to_trades_dataframe(),
returns=analysis.returns,
equity_curve=result.to_equity_dataframe(),
metrics=result.metrics,
template="full",
title="My Strategy — Full Report",
output_path="tearsheet.html",
)
Metrics Keys That Enable Tearsheet Sections¶
The metrics dict controls which tearsheet sections render. Sections gracefully degrade when keys are missing.
| Section | Required Metrics Keys |
|---|---|
| Executive Summary | sharpe_ratio, max_drawdown, win_rate, profit_factor, n_trades, cagr, volatility, expectancy |
| Cost Attribution | gross_pnl, commission, slippage |
| Statistical Validity (DSR) | dsr_probability, dsr_significant, min_trl, current_trl, trl_sufficient |
| RAS Adjustment | ras_adjusted_ic, ras_significant, original_ic, rademacher_complexity |
| Confidence Intervals | sharpe_ratio, sharpe_ratio_lower_95, sharpe_ratio_upper_95 (similarly for other metrics) |
| Haircut Sharpe | sharpe, n_periods or n_observations |
| Expected Max Sharpe | expected_max_sharpe |
Sections that depend only on trades (trade analysis, MFE/MAE, exit reasons) or returns (drawdown, monthly heatmap, rolling Sharpe) require no special metrics keys.
Config Preservation¶
The config used for the backtest is preserved in the result:
See It in Action¶
The Machine Learning for Trading book uses BacktestResult in every case study:
- Ch16 / NB05 (
performance_reporting) —portfolio_analysis_from_result(), MFE/MAE analysis, gross vs net comparison, full 24-section tearsheet - Ch16 case studies — all cases save trade artifacts via
to_parquet()and pass trades/metrics/equity to tearsheet generation - Ch16 / NB06 (
sharpe_ratio_inference) — statistical inference on backtest results
Next Steps¶
- Book Guide -- notebook and case-study map for reporting workflows
- Quickstart -- end-to-end examples
- Profiles -- compare results across framework profiles