Path Signatures and Log-Signatures for Financial Sequences
Path signatures encode the ordered geometry of multivariate sequences through iterated integrals. This primer covers the algebra, Chen-style composition, and the embedding choices that decide whether the construction carries real information in finance.
Path Signatures and Log-Signatures for Financial Sequences
Path signatures encode the ordered geometry of multivariate sequences through iterated integrals. This primer covers the algebra, Chen-style composition, and the embedding choices that decide whether the construction carries real information in finance.
The Intuition
Many standard features summarize a sequence by level, return, volatility, or a short lag stack. Those summaries are often enough. Sometimes they are not.
Two price paths can share:
- the same start and end points
- the same realized variance
- similar rolling moments
and still differ meaningfully in their ordering. One path may trend smoothly upward; the other may whipsaw and recover. For certain tasks, that ordered geometry matters.
Path signatures were designed to encode that geometry through iterated integrals. In practice, the right way to think about them is:
signatures summarize how a path moves through time, not just where it starts and ends.
First Principles: Ordered Integrals
For a multivariate path \(X_t \in \mathbb{R}^d\), the signature collects integrals of increasing depth.
Depth 1 gives the coordinate increments:
$$ S^{(i)}(X) = \int dX^{(i)} = X_T^{(i)} - X_0^{(i)}. $$
Depth 2 gives ordered interaction terms:
$$ S^{(i,j)}(X) = \int_0^T \left(\int_0^v dX_u^{(i)}\right) dX_v^{(j)}. $$
That ordering is the key idea. The \((i,j)\) term is not the same as the \((j,i)\) term. Signatures therefore retain information about sequence order that plain endpoint summaries erase.
For bounded-variation paths, linear functionals of signatures are dense in continuous path functionals on compact sets, subject to the usual equivalence caveats. In plain language: given enough depth, signatures can approximate very rich functions of path shape. That universality result is the theoretical reason signatures appear in ML at all, not a guarantee of finite-sample predictive gain on noisy financial data.
Why Depth Matters
The full signature is infinite. In practice you truncate at low depth:
- depth 1: increments
- depth 2: pairwise ordered interactions
- depth 3+: higher-order path geometry
Higher depth means more expressiveness and much faster dimensionality growth.
For a \(d\)-dimensional path truncated at depth \(m\), the raw signature size is:
$$ \sum_{k=0}^{m} d^k. $$
The \(k=0\) term is the scalar constant term, conventionally equal to 1. Log-signatures are smaller because they remove algebraic redundancy from the full tensor signature: the full signature lives in the tensor algebra and satisfies identities that make many coordinates dependent, while the log-signature keeps only the independent generators. For example, with \(d=3\) and depth \(m=3\), the full signature has 40 terms while the log-signature has 14, broken out as 3 at depth 1, 3 at depth 2, and 8 at depth 3 (from the dimension of the free Lie algebra at that grade).
That is why low-depth signatures or log-signatures are usually the practical starting point.
A Worked Path Example
Take two two-dimensional piecewise-linear paths with the same start and end:
- Path A: move
right, thenup - Path B: move
up, thenright
Using coordinates \((x_t, y_t)\), this means:
- Path A:
(0,0) -> (1,0) -> (1,1) - Path B:
(0,0) -> (0,1) -> (1,1)
Both end at (1,1), so their depth-1 signature terms are identical:
$$ S^{(x)} = 1, \qquad S^{(y)} = 1. $$
Depth 2 distinguishes them. Since both paths start at the origin, the inner integral simplifies to the coordinate value itself, so the ordered cross terms reduce to
$$ S^{(x,y)} = \int_0^T x_t \, dy_t, \qquad S^{(y,x)} = \int_0^T y_t \, dx_t. $$
For Path A, the horizontal segment comes first, so \(y_t = 0\) while \(dx_t \neq 0\), and the vertical segment comes second, so \(x_t = 1\) while \(dy_t \neq 0\). Therefore
$$ S^{(x,y)} = \int 1 \, dy = 1, \qquad S^{(y,x)} = \int 0 \, dx = 0. $$
For Path B, the ordering reverses:
$$ S^{(x,y)} = \int 0 \, dy = 0, \qquad S^{(y,x)} = \int 1 \, dx = 1. $$
So the depth-2 pair becomes (1, 0) for Path A and (0, 1) for Path B. Same endpoint, different
ordered geometry.
Interpretation:
- Path A says the system changed along the first channel before the second
- Path B says the reverse
That is the core signature idea: depth-2 terms can distinguish ordered cross-channel interactions that depth-1 increments erase. In one dimension, without time or other augmentation, the signature is usually too close to "net change plus higher-order powers of that same increment" to be useful for separating reorderings in practice.
Discrete Data Need a Path Construction
Financial data arrive as discrete observations, not continuous curves. So before you compute a signature you must choose how to embed the sample as a path, and that choice is part of the model.
Common choices include:
- piecewise linear interpolation
- lead-lag augmentation, which pairs each observation with a lagged copy so the path preserves more local ordering and variation information
- time augmentation plus normalization
Those choices matter. Two researchers can start from the same observations and produce different signature features if they construct the underlying path differently. The embedding is part of the model, not a preprocessing footnote.
Why Time Augmentation Is Often Crucial in Finance
Financial sequences are observed in order, but raw coordinates do not always encode that order explicitly enough.
Time augmentation appends time as an extra channel:
$$ \tilde{X}_t = (t, X_t). $$
This does two things:
- it prevents some reparameterization ambiguities
- it makes the signature sensitive to when movements occurred, not only to the geometric trace
The plain signature is invariant to time reparameterization of the same geometric path. That is often elegant mathematically and inconvenient financially, because a burst early in the window is not the same as a burst just before decision time. More generally, signatures identify paths only up to the relevant equivalence class, so practical distinguishability depends heavily on the augmentation and embedding choices above. Time augmentation is therefore often crucial in practice, especially for one-dimensional financial paths.
Full Signatures Versus Log-Signatures
The full signature is expressive but redundant. Many terms are linked algebraically. The log-signature removes that redundancy by working in a compressed basis.
Practical consequences:
- full signatures are conceptually direct
- log-signatures are usually smaller and better behaved numerically
- low-depth log-signatures are often the right first benchmark
That is why dimensionality control matters. The problem is rarely "can we compute one more depth?" The problem is whether the additional terms actually beat simpler lagged or filtered baselines.
Chen's Identity, Log-Signatures, and What Lags Miss
The algebraic reason signatures matter is that they linearize path concatenation. If a path X is
the concatenation of two pieces $X^{(1)} * X^{(2)}$, then Chen's identity says
$$ S\left(X^{(1)} * X^{(2)}\right) = S\left(X^{(1)}\right) \otimes S\left(X^{(2)}\right), $$
where $\otimes$ is tensor multiplication. The log-signature turns that multiplicative composition into a Lie-series representation that removes algebraic redundancy. That is the deeper justification for using log-signatures rather than treating them as an arbitrary compressed variant.
Lag features answer questions like:
- what happened at a few recent times?
- what is the current level?
- what are recent slopes or rolling moments?
Signatures answer a different question:
- what ordered path geometry generated those observations?
That matters when order-flow bursts and reversals have distinct geometry, when two volatility paths share the same average scale but different sequencing, or when multivariate channels interact in an ordered way. Rich lag interactions can also encode ordering, but signatures give a principled path algebra rather than a manually enumerated feature soup. They are not automatically smaller or better; their value is that the geometry is explicit.
In Practice
Path signatures are specialized. They earn their keep only when path shape plausibly matters enough to justify the added complexity.
Practical workflow:
- normalize the path sensibly
- augment with time
- start with depth 2 or low-depth log-signatures
- compare against strong lagged and filtered baselines
- prune aggressively if the geometric terms do not add out-of-sample value
Good candidate inputs:
- intraday price, spread, and imbalance trajectories where ordered joint movement is the hypothesis
- multivariate microstructure states
- volatility and order-flow channels observed jointly
Weak candidate inputs:
- settings where simple rolling features already dominate
- short samples where the signature dimension overwhelms the data
Use signatures only when you can state the geometry hypothesis in one sentence, for example: "joint price, spread, and order-flow shape over the next 30 minutes matters beyond what lagged returns and volatility already capture."
A Fair Benchmarking Rule
This topic invites overengineering. So the benchmark discipline matters:
- compare against lag stacks
- compare against filtered state summaries
- compare against simple interaction features
- compare against modern sequence encoders when those are already in scope
- hold the walk-forward design fixed
If a depth-2 log-signature cannot beat a well-built lag baseline, the problem probably does not need signature machinery at all. Use signatures only when you have a plausible hypothesis that ordered geometry matters beyond lags, rolling features, and filtered states.
Common Mistakes
- Forgetting time augmentation and then wondering why ordering information seems weak.
- Pushing depth too high relative to sample size.
- Comparing signatures to weak lag baselines and calling the gain profound.
- Ignoring normalization, which makes channels with large scale dominate the representation.
- Treating this as rough-path theory when the practical problem is feature engineering.
Connections
- Book chapters: Ch09 Model-Based Feature Extraction; Ch13 Deep Learning for Time Series
- Related primers:
state-space-models-and-kalman-filtering.md - Why it matters next: signatures connect directly to state-space features, deep sequence models, and the broader question of when structured representations of path shape add value beyond classical lagged features
Register to Read
Sign up for a free account to access all 61 primer articles.
Create Free AccountAlready have an account? Sign in