Chapter 9: Model-Based Feature Extraction

Path Signatures and Log-Signatures for Financial Sequences

Path signatures encode the ordered geometry of multivariate sequences through iterated integrals. This primer covers the algebra, Chen-style composition, and the embedding choices that decide whether the construction carries real information in finance.

Path Signatures and Log-Signatures for Financial Sequences

Path signatures encode the ordered geometry of multivariate sequences through iterated integrals. This primer covers the algebra, Chen-style composition, and the embedding choices that decide whether the construction carries real information in finance.

The Intuition

Many standard features summarize a sequence by level, return, volatility, or a short lag stack. Those summaries are often enough. Sometimes they are not.

Two price paths can share:

the same start and end points
the same realized variance
similar rolling moments

and still differ meaningfully in their ordering. One path may trend smoothly upward; the other may whipsaw and recover. For certain tasks, that ordered geometry matters.

Path signatures were designed to encode that geometry through iterated integrals. In practice, the right way to think about them is:

signatures summarize how a path moves through time, not just where it starts and ends.

First Principles: Ordered Integrals

For a multivariate path $X_t \in \mathbb{R}^d$, the signature collects integrals of increasing depth.

Depth 1 gives the coordinate increments:

$$ S^{(i)}(X) = \int dX^{(i)} = X_T^{(i)} - X_0^{(i)}. $$

Depth 2 gives ordered interaction terms:

$$ S^{(i,j)}(X) = \int_0^T \left(\int_0^v dX_u^{(i)}\right) dX_v^{(j)}. $$

That ordering is the key idea. The $(i,j)$ term is not the same as the $(j,i)$ term. Signatures therefore retain information about sequence order that plain endpoint summaries erase.

For bounded-variation paths, linear functionals of signatures are dense in continuous path functionals on compact sets, subject to the usual equivalence caveats. In plain language: given enough depth, signatures can approximate very rich functions of path shape. That universality result is the theoretical reason signatures appear in ML at all, not a guarantee of finite-sample predictive gain on noisy financial data.

Why Depth Matters

The full signature is infinite. In practice you truncate at low depth:

depth 1: increments
depth 2: pairwise ordered interactions
depth 3+: higher-order path geometry

Higher depth means more expressiveness and much faster dimensionality growth.

For a $d$-dimensional path truncated at depth $m$, the raw signature size is:

$$ \sum_{k=0}^{m} d^k. $$

The $k=0$ term is the scalar constant term, conventionally equal to 1. Log-signatures are smaller because they remove algebraic redundancy from the full tensor signature: the full signature lives in the tensor algebra and satisfies identities that make many coordinates dependent, while the log-signature keeps only the independent generators. For example, with $d=3$ and depth $m=3$, the full signature has 40 terms while the log-signature has 14, broken out as 3 at depth 1, 3 at depth 2, and 8 at depth 3 (from the dimension of the free Lie algebra at that grade).

That is why low-depth signatures or log-signatures are usually the practical starting point.

A Worked Path Example

Take two two-dimensional piecewise-linear paths with the same start and end:

Path A: move right, then up
Path B: move up, then right

Using coordinates $(x_t, y_t)$, this means:

Path A: (0,0) -> (1,0) -> (1,1)
Path B: (0,0) -> (0,1) -> (1,1)

Both end at (1,1), so their depth-1 signature terms are identical:

$$ S^{(x)} = 1, \qquad S^{(y)} = 1. $$

Depth 2 distinguishes them. Since both paths start at the origin, the inner integral simplifies to the coordinate value itself, so the ordered cross terms reduce to

$$ S^{(x,y)} = \int_0^T x_t \, dy_t, \qquad S^{(y,x)} = \int_0^T y_t \, dx_t. $$

For Path A, the horizontal segment comes first, so $y_t = 0$ while $dx_t \neq 0$, and the vertical segment comes second, so $x_t = 1$ while $dy_t \neq 0$. Therefore

$$ S^{(x,y)} = \int 1 \, dy = 1, \qquad S^{(y,x)} = \int 0 \, dx = 0. $$

For Path B, the ordering reverses:

$$ S^{(x,y)} = \int 0 \, dy = 0, \qquad S^{(y,x)} = \int 1 \, dx = 1. $$

So the depth-2 pair becomes (1, 0) for Path A and (0, 1) for Path B. Same endpoint, different ordered geometry.

Interpretation:

Path A says the system changed along the first channel before the second
Path B says the reverse

That is the core signature idea: depth-2 terms can distinguish ordered cross-channel interactions that depth-1 increments erase. In one dimension, without time or other augmentation, the signature is usually too close to "net change plus higher-order powers of that same increment" to be useful for separating reorderings in practice.

Discrete Data Need a Path Construction

Financial data arrive as discrete observations, not continuous curves. So before you compute a signature you must choose how to embed the sample as a path, and that choice is part of the model.

Common choices include:

piecewise linear interpolation
lead-lag augmentation, which pairs each observation with a lagged copy so the path preserves more local ordering and variation information
time augmentation plus normalization

Those choices matter. Two researchers can start from the same observations and produce different signature features if they construct the underlying path differently. The embedding is part of the model, not a preprocessing footnote.

Why Time Augmentation Is Often Crucial in Finance

Financial sequences are observed in order, but raw coordinates do not always encode that order explicitly enough.

Time augmentation appends time as an extra channel:

$$ \tilde{X}_t = (t, X_t). $$

This does two things:

it prevents some reparameterization ambiguities
it makes the signature sensitive to when movements occurred, not only to the geometric trace

The plain signature is invariant to time reparameterization of the same geometric path. That is often elegant mathematically and inconvenient financially, because a burst early in the window is not the same as a burst just before decision time. More generally, signatures identify paths only up to the relevant equivalence class, so practical distinguishability depends heavily on the augmentation and embedding choices above. Time augmentation is therefore often crucial in practice, especially for one-dimensional financial paths.

Full Signatures Versus Log-Signatures

The full signature is expressive but redundant. Many terms are linked algebraically. The log-signature removes that redundancy by working in a compressed basis.

Practical consequences:

full signatures are conceptually direct
log-signatures are usually smaller and better behaved numerically
low-depth log-signatures are often the right first benchmark

That is why dimensionality control matters. The problem is rarely "can we compute one more depth?" The problem is whether the additional terms actually beat simpler lagged or filtered baselines.

Chen's Identity, Log-Signatures, and What Lags Miss

The algebraic reason signatures matter is that they linearize path concatenation. If a path X is the concatenation of two pieces $X^{(1)} * X^{(2)}$, then Chen's identity says

$$ S\left(X^{(1)} * X^{(2)}\right) = S\left(X^{(1)}\right) \otimes S\left(X^{(2)}\right), $$

where $\otimes$ is tensor multiplication. The log-signature turns that multiplicative composition into a Lie-series representation that removes algebraic redundancy. That is the deeper justification for using log-signatures rather than treating them as an arbitrary compressed variant.

Lag features answer questions like:

what happened at a few recent times?
what is the current level?
what are recent slopes or rolling moments?

Signatures answer a different question:

what ordered path geometry generated those observations?

That matters when order-flow bursts and reversals have distinct geometry, when two volatility paths share the same average scale but different sequencing, or when multivariate channels interact in an ordered way. Rich lag interactions can also encode ordering, but signatures give a principled path algebra rather than a manually enumerated feature soup. They are not automatically smaller or better; their value is that the geometry is explicit.

In Practice

Path signatures are specialized. They earn their keep only when path shape plausibly matters enough to justify the added complexity.

Practical workflow:

normalize the path sensibly
augment with time
start with depth 2 or low-depth log-signatures
compare against strong lagged and filtered baselines
prune aggressively if the geometric terms do not add out-of-sample value

Good candidate inputs:

intraday price, spread, and imbalance trajectories where ordered joint movement is the hypothesis
multivariate microstructure states
volatility and order-flow channels observed jointly

Weak candidate inputs:

settings where simple rolling features already dominate
short samples where the signature dimension overwhelms the data

Use signatures only when you can state the geometry hypothesis in one sentence, for example: "joint price, spread, and order-flow shape over the next 30 minutes matters beyond what lagged returns and volatility already capture."

A Fair Benchmarking Rule

This topic invites overengineering. So the benchmark discipline matters:

compare against lag stacks
compare against filtered state summaries
compare against simple interaction features
compare against modern sequence encoders when those are already in scope
hold the walk-forward design fixed

If a depth-2 log-signature cannot beat a well-built lag baseline, the problem probably does not need signature machinery at all. Use signatures only when you have a plausible hypothesis that ordered geometry matters beyond lags, rolling features, and filtered states.

Common Mistakes

Forgetting time augmentation and then wondering why ordering information seems weak.
Pushing depth too high relative to sample size.
Comparing signatures to weak lag baselines and calling the gain profound.
Ignoring normalization, which makes channels with large scale dominate the representation.
Treating this as rough-path theory when the practical problem is feature engineering.

Connections

Book chapters: Ch09 Model-Based Feature Extraction; Ch13 Deep Learning for Time Series
Related primers: state-space-models-and-kalman-filtering.md
Why it matters next: signatures connect directly to state-space features, deep sequence models, and the broader question of when structured representations of path shape add value beyond classical lagged features

Register to Read

Create Free Account

Already have an account? Sign in

Chapter

9 Model-Based Feature Extraction

More Primers

Autoregressive, Moving-Average, and ARIMA Foundations for Feature Engineering Bayesian Inference and MCMC for Time Series Fractional Differencing and Long Memory in Financial Features State-Space Models and the Kalman Filter Structural Break Diagnostics and Time-Since-Break Features Uncertainty as a Feature: Stochastic Volatility, Forecast Intervals, and Forecast Uncertainty Wavelets for Multi-Scale Diagnostics and Causal Feature Design