Chapter 9: Model-Based Feature Extraction

Uncertainty as a Feature: Stochastic Volatility, Forecast Intervals, and Forecast Uncertainty

In trading, two models with the same point forecast are not equivalent if one is much less certain than the other.

Uncertainty as a Feature: Stochastic Volatility, Forecast Intervals, and Forecast Uncertainty

In trading, two models with the same point forecast are not equivalent if one is much less certain than the other.

The Intuition

Model-based feature pipelines should treat uncertainty outputs as first-class feature objects rather than decorative error bars.

Suppose two models both forecast the same next-day volatility target $y_{t+1}$ at 1.8%. If model A has a narrow prediction interval and model B has a wide one, they are telling you different things:

model A says the future looks concentrated around 1.8%
model B says 1.8% is only the center of a much wider set of plausible outcomes

For downstream decisions, that difference matters. A forecast can be directionally unchanged while the uncertainty attached to that forecast object changes materially.

That is why useful uncertainty-derived features include:

forecast standard error
prediction-interval width
posterior variance of a latent state
relative uncertainty such as interval width divided by the point forecast

These are usually conditioning inputs, not standalone signals.

Point Forecasts Versus Predictive Distributions

A point forecast is one summary:

$$ \hat{y}_{t+h|t} = \mathbb{E}[y_{t+h} \mid \mathcal{F}_t]. $$

But a forecasting model really implies a full predictive distribution:

$$ y_{t+h} \mid \mathcal{F}_t \sim p(y_{t+h} \mid \mathcal{F}_t). $$

Prediction intervals are not confidence intervals. They describe uncertainty about future outcomes, not uncertainty about a fixed parameter estimate.

From that distribution you can extract:

the mean or median forecast
the forecast variance
quantiles and prediction intervals
tail probabilities such as $\mathbb{P}(y_{t+h} > c \mid \mathcal{F}_t)$

For feature engineering, the important lesson is simple:

the forecasted level and the forecasted uncertainty answer different questions.

The first says what is most likely. The second says how concentrated or fragile that answer is.

A Compact Volatility Example

Let $h_t$ denote latent log variance:

$$ h_t = \mu + \phi(h_{t-1} - \mu) + \eta_t, \qquad r_t = \exp(h_t/2)\varepsilon_t, $$

where $\eta_t \sim N(0,\sigma_\eta^2)$ and $\varepsilon_t \sim N(0,1)$ are independent shocks.

This is the basic logic behind stochastic-volatility modeling:

returns are observed
volatility is latent
the model estimates both the current latent state and the uncertainty about that state

The model can then emit features such as:

$sv_level$: filtered estimate of $h_t$ or a transformed volatility estimate such as $\exp(\mathbb{E}[h_t \mid r_{1:t}]/2)$, which is a transformed latent-state summary rather than the conditional expectation of volatility itself
$sv_var$: posterior variance of the latent state, $\mathrm{Var}(h_t \mid r_{1:t})$
$sv_interval_width$: width of a predictive interval for a future observable target such as next-day realized volatility $y_{t+1}$
$sv_relative_uncertainty$: that predictive width divided by the forecast level on the same target

Those four objects are not redundant.

Filtered Versus Smoothed Uncertainty

This distinction is load-bearing.

A filtered estimate uses only information available up to time $t$:

$$ p(h_t \mid r_{1:t}). $$

A smoothed estimate uses the full sample:

$$ p(h_t \mid r_{1:T}). $$

Smoothing is fine for retrospective analysis and dangerous for live features. Future observations can sharpen yesterday's volatility estimate in ways no trader could have known at the time.

The same logic applies to uncertainty:

filtered posterior variance is live-safe
smoothed posterior variance is ex post

This is why uncertainty features should be built from filtered objects inside each walk-forward training window.

Three Different Uncertainties

These objects are related but not interchangeable:

Latent-state uncertainty How uncertain the model is about the current hidden state such as $h_t$.
Predictive uncertainty How dispersed future outcomes are, conditional on what is known now.
Parameter uncertainty How uncertain the fitted parameters are.

The same model can be sharp about the latent state and still broad about future outcomes, or vice versa. Observation noise belongs inside predictive uncertainty here, rather than as a separate pillar, because for feature-engineering purposes the downstream consumer usually cares about the total dispersion of the future observable it will actually act on.

A Worked Scenario

Imagine two models producing the same next-day volatility target forecast:

Model	Point Forecast	90% Interval	Width	Relative Width (Width / Forecast)
Model A	1.8%	[1.6%, 2.0%]	0.4%	0.22
Model B	1.8%	[1.0%, 2.6%]	1.6%	0.89

The point forecast is identical. The uncertainty features are not.

Interpretation:

Model A implies stable recent dynamics and a concentrated forecast
Model B implies the same center with much weaker conviction

That difference can affect:

position sizing
leverage caps
whether to trust a spread or carry signal conditioned on volatility
whether to hand control to a more conservative policy

The point forecast alone would hide all of that.

Relative Uncertainty Often Matters More Than Absolute Uncertainty

An interval width of 1% means different things when the forecast is 2% versus 20%.

So a more portable feature is often:

$$ \text{relative uncertainty} = \frac{\text{interval width}}{\lvert \hat{y}_{t+h|t} \rvert + \delta}. $$

Here $\delta$ is a small stabilization constant used to avoid division by values near zero, for example 1% of the training-window median forecast level, with a fixed floor if needed for very low-volatility assets. This ratio is most natural for strictly positive targets such as volatility, and only when numerator and denominator refer to the same target on the same scale.

This is especially useful when comparing:

assets with very different base volatility
calm and stressed regimes
multiple horizons

Raw uncertainty is still useful, but relative uncertainty is often easier for downstream models to interpret consistently.

Calibration Before Conditioning

An uncertainty feature is only worth conditioning on if the probabilistic object is at least roughly well calibrated.

Useful checks:

empirical interval coverage versus nominal coverage
interval score or CRPS when you have full predictive distributions
log score when the model emits a density rather than only an interval

A wide interval can still be informative even if it is imperfectly calibrated, but you should know whether the interval behaves like a real probabilistic object before treating it as a serious risk input.

Which Tools Capture Which Layers

Different model classes propagate different uncertainty layers.

linear Gaussian state-space filters naturally separate observation noise and state uncertainty
stochastic-volatility models usually require approximate filtering or simulation to obtain analogous uncertainty summaries
particle-MCMC or related Bayesian workflows can also propagate parameter uncertainty
model misspecification remains a residual caveat rather than something standard intervals solve

Computationally, these are not equivalent. Kalman-style state-space summaries are usually cheap enough for broad live pipelines; particle filters and particle-MCMC are much more expensive and are often reserved for slower or narrower workflows.

That is why interval widths from two libraries or papers are not automatically comparable unless they refer to the same target and the same scale.

For feature design, the question is practical rather than philosophical:

does the uncertainty object reliably tell you when the point forecast should be trusted less?

In Practice

Useful uncertainty-aware features include:

forecast standard error from ARIMA or state-space models
posterior variance of latent volatility
width of central prediction intervals
entropy or dispersion of regime probabilities
disagreement across ensembles

Good operational habits:

compute features from filtered rather than smoothed objects
keep horizon labels explicit: h=1 uncertainty is not h=10 uncertainty
use relative and absolute uncertainty together
check whether wide intervals are actually calibrated enough to predict downstream fragility
benchmark whether uncertainty actually conditions downstream performance

Different uncertainty objects matter to different downstream consumers:

a model stack may use predictive dispersion as an input feature
a sizing rule may care most about relative interval width
a risk overlay may care about tail probabilities or coverage failures

If a feature says "high expected alpha" but its companion uncertainty signal is extreme, the system should often respond with smaller size, wider risk limits, or lower trust.

Common Mistakes

Treating confidence intervals as presentation aids rather than model outputs.
Using smoothed posterior uncertainty in a backtest and calling it live-safe.
Comparing interval widths across assets or horizons without normalization.
Confusing parameter uncertainty with predictive uncertainty.
Assuming wide intervals are useless when they may be the key conditioning variable.

Connections

Book chapters: Ch09 Model-Based Feature Extraction; Ch21 Portfolio Optimization and Risk Management
Related primers: state-space-models-and-kalman-filtering.md, garch-family-models.md
Why it matters next: uncertainty features connect directly to Bayesian time-series models, stochastic-volatility modeling, regime probabilities, conformal prediction, and portfolio-level risk controls where forecast uncertainty should affect sizing and governance

Register to Read

Create Free Account

Already have an account? Sign in

Chapter

9 Model-Based Feature Extraction

More Primers

Autoregressive, Moving-Average, and ARIMA Foundations for Feature Engineering Bayesian Inference and MCMC for Time Series Fractional Differencing and Long Memory in Financial Features Path Signatures and Log-Signatures for Financial Sequences State-Space Models and the Kalman Filter Structural Break Diagnostics and Time-Since-Break Features Wavelets for Multi-Scale Diagnostics and Causal Feature Design