Chapter 9: Model-Based Feature Extraction

Bayesian Inference and MCMC for Time Series

A Bayesian time-series model produces a posterior distribution, not just a fitted line, which is why posterior uncertainty can itself become a feature.

Bayesian Inference and MCMC for Time Series

A Bayesian time-series model produces a posterior distribution, not just a fitted line, which is why posterior uncertainty can itself become a feature.

The Intuition

Many time-series problems in trading are latent-state problems. Current volatility is not directly observed. Regime persistence is uncertain. Small samples make parameter estimates fragile. In those settings, a single fitted value hides something important: how uncertain the model still is.

Bayesian inference keeps that uncertainty explicit. Before seeing the data, you encode a range of plausible values for the unknown quantity. After seeing the data, you update that belief. The updated belief is the posterior.

For a simple drift estimate, the picture is intuitive. With little data, many drift values remain plausible, so the posterior stays wide. With more informative data, it tightens. For latent volatility the same logic applies, except now the unknown object is not only a parameter vector. It is also an evolving hidden state.

That is why Chapter 9 cares about posterior width. Two dates can have the same posterior mean for volatility and very different posterior uncertainty. Those are not the same state for a trading system. One says "risk is around 20%." The other says "20% is the center of a much wider band of possibilities."

The Math

Bayesian updating is

\[ p(\theta \mid y) \propto p(y \mid \theta)\,p(\theta), \]

where \(p(\theta)\) is the prior, \(p(y \mid \theta)\) the likelihood, and \(p(\theta \mid y)\) the posterior.

In a stochastic-volatility model, a stylized specification is

\[ r_t = \exp(h_t / 2)\,\varepsilon_t,\qquad \varepsilon_t \sim \mathcal{N}(0,1), \]

\[ h_t = \mu + \phi(h_{t-1}-\mu) + \eta_t,\qquad \eta_t \sim \mathcal{N}(0,\sigma_h^2), \]

where \(r_t\) is the observed return and \(h_t\) is latent log-volatility. The unknown object is now high-dimensional: parameters \((\mu,\phi,\sigma_h)\) plus the whole latent path \((h_1,\dots,h_T)\).

That is why closed-form posterior summaries are often unavailable. We know the posterior up to a normalizing constant, but not in a form that yields analytic means, quantiles, or credible bands.

Why MCMC appears

Markov chain Monte Carlo constructs a dependent sequence of draws whose long-run distribution is the posterior. Once the chain mixes well, posterior means, medians, interval widths, and tail probabilities can be estimated from those draws.

Hamiltonian Monte Carlo improves on random-walk proposals by using gradient information to move more efficiently through the posterior. NUTS, the No-U-Turn Sampler used in PyMC, adapts the HMC trajectory length automatically. That is why modern Bayesian workflows can fit models that would be painful to tune manually.

Variational inference plays a different game. Instead of sampling the true posterior, it fits a simpler approximating distribution. It is often much faster, but it can understate uncertainty. That distinction matters when uncertainty itself is one of the features you want to use.

A Feature-Level Example

Suppose a stochastic-volatility model is fit at the end of two different months. In both months, the posterior mean for annualized volatility is 20%. But the posterior summaries differ:

Date	Posterior mean	90% credible interval	Posterior std. dev.	Interpretation
Month A	20%	19% to 21%	low	state is well identified
Month B	20%	14% to 27%	high	the same center is much less reliable

If you keep only the posterior mean, the two months look identical. If you keep posterior standard deviation or interval width, they do not. That is the practical payoff: posterior dispersion is not decorative output. It can condition sizing, thresholding, and confidence in the signal.

This is also where filtered versus smoothed summaries matter. A filtered posterior uses only information available up to time \(t\). A smoothed posterior borrows information from the future. Smoothed states are useful for retrospective analysis, but they are not valid live features.

In Practice

The practical workflow is narrower than a full Bayesian statistics course.

First, choose a model whose posterior objects match the features you want. In Chapter 9 those might be posterior mean volatility, posterior standard deviation, credible-interval width, or the probability that volatility exceeds a stress threshold.

Second, fit the model only on data available at decision time. This is the Bayesian analogue of any other leakage rule: a full-sample posterior can be statistically coherent and still be unusable for causal feature extraction.

Third, diagnose the computation before trusting the outputs.

trace plots should not show chains stuck in different regions
effective sample size should not be tiny
\(\hat R\) should be close to 1
posterior predictive checks should not show obvious mismatch

Those checks are not optional when posterior width is itself a feature. A badly mixed chain can manufacture false certainty or false ambiguity.

Common Mistakes

Treating the posterior mean as the whole model output.
Reading smoothed full-sample latent states as point-in-time features.
Assuming MCMC output is trustworthy because the sampler finished.
Treating variational inference and MCMC as interchangeable when uncertainty quantification is the main objective.

Figure Specification

Use a two-panel figure:

prior, likelihood, and posterior over one scalar parameter
two dates with the same posterior mean volatility but different credible-band width

The visual point is that Chapter 9 uses posterior distributions because uncertainty itself can be a useful state variable.

Connections

Book chapter: Ch09 Model-Based Feature Extraction
Related primers: State-Space Models and Kalman Filtering for Feature Engineering; Volatility Models as Feature Extractors: GARCH, EGARCH, and HAR

Register to Read

Create Free Account

Already have an account? Sign in

Chapter

9 Model-Based Feature Extraction

More Primers

Autoregressive, Moving-Average, and ARIMA Foundations for Feature Engineering Fractional Differencing and Long Memory in Financial Features Path Signatures and Log-Signatures for Financial Sequences State-Space Models and the Kalman Filter Structural Break Diagnostics and Time-Since-Break Features Uncertainty as a Feature: Stochastic Volatility, Forecast Intervals, and Forecast Uncertainty Wavelets for Multi-Scale Diagnostics and Causal Feature Design