Chapter 4: Fundamental and Alternative Data

Time-Valid Security Masters and Identifier Histories

An identifier match is only useful if it resolves the right object at the right time.

Time-Valid Security Masters and Identifier Histories

An identifier match is only useful if it resolves the right object at the right time.

The Intuition

Financial data does not live on one identity layer.

A single real-world firm can correspond to:

a legal entity
an issuer
several listed securities
ADRs in another market
futures or options contracts tied to the same underlying

That is why "join on ticker" is not a harmless shortcut. The ticker may have changed, been reused, or point to the wrong share class. Even if the identifier is valid syntactically, the mapping can still be wrong economically.

A security master exists to solve exactly this problem. It stores crosswalks between identity layers and, crucially, makes those crosswalks time-valid.

The Identity Layers

A useful working model has four levels:

Layer	What it identifies	Typical examples
entity	legal or economic organization	parent company, subsidiary
issuer	issuing firm in capital markets	listed company filing with the SEC
security	tradable instrument	common stock, ADR, bond
contract	dated derivative claim	futures contract, option series

This is a useful decomposition, not the only canonical ontology. Real vendor graphs and reference-data systems often slice these layers differently.

Many alternative datasets arrive at the entity or issuer level. Returns, however, are measured at the security or contract level. A correct pipeline therefore needs an explicit issuer-to-security mapping rather than an implicit belief that the names "obviously" line up.

Why Time Validity Matters

Mappings change over time:

firms rebrand
tickers are reused
share classes merge or split
ADR programs begin or end
corporate actions replace one security with another

A static crosswalk is therefore a leakage risk. The right object is not:

which security matches this issuer today?

It is:

which security mapping was valid on the decision date?

Formally, if G(i, t) returns the set of securities linked to issuer i at time t, then any selection rule must be evaluated under an effective-date filter:

$$ s \in G(i, t) \quad \text{only if} \quad \text{effective\_date} \le t < \text{end\_date}. $$

Without that time filter, historical joins silently drift into the future. If the downstream task requires one tradable object, you still need an explicit rule for choosing the primary security from that valid set.

A Worked Example

Suppose you have issuer-level web traffic for a company and want to join it to traded returns.

Bad join

You match the issuer name to the currently active ticker and use that security for the full backtest.

That can fail in several ways:

the ticker was reused by a different firm earlier in the sample
the current security is an ADR, while the signal belongs to the primary listing
the source refers to the parent entity while the return series belongs to a carve-out subsidiary
the issuer has multiple active share classes, such as Alphabet's GOOG and GOOGL, and your signal is joined to whichever one your vendor happens to return first

Better join

You store:

issuer id
security id
identifier type and value
effective date
end date

Then you join the signal only to mappings valid on the decision date.

This is not bureaucracy. It changes the sample and can change the sign of the measured effect.

Why Corporate Actions Complicate "Simple" Joins

Corporate actions are where many silent failures originate.

Share-class problems

Two common share classes may have:

different tickers
different liquidity
different voting rights
different prices and return behavior

Joining issuer-level data to the wrong class may preserve the narrative but destroy the tradable object.

ADR problems

An ADR and its ordinary share are economically related but not interchangeable:

they trade in different venues
they can have different market hours
liquidity and costs differ
the ADR ratio may not be one-to-one

Contract problems

For futures and options, the contract itself has a maturity. A root symbol is not enough. You need contract-level identity and roll logic.

Identifier Families Are Not Interchangeable

Different identifiers solve different problems:

CIK identifies the SEC filer
LEI identifies a legal entity
FIGI, CUSIP, ISIN, and SEDOL identify securities
exchange tickers are human-friendly but operationally fragile

Some of these identifiers also change through time. A CUSIP change strengthens the case for stable internal keys rather than historical joins built from one external code.

An identifier match succeeds only if the identifier family matches the layer you are trying to join.

This is why "identifier available" is not the same as "join solved."

A Minimal Security-Master Schema

A practical security master needs at least:

Field	Role
$master_id$	stable internal key
$layer_type$	entity / issuer / security / contract
$identifier_type$	ticker, CUSIP, FIGI, CIK, etc.
$identifier_value$	the raw external identifier
$effective_date$	when the mapping became valid
$end_date$	when the mapping ceased to be valid
$parent_master_id$	optional upward link across layers
`source`	where the mapping came from
`venue`	trading venue or market when relevant
$is_primary$	whether this is the default common or primary listing
$security_role$	ADR, ordinary share, bond, option, future, etc.

You can picture this as a time-valid graph rather than one flat lookup table.

Diagnostics That Catch Silent Misjoins

Good validation checks are simple and brutal:

one issuer should not map to two active primary common stocks on the same date without explanation
a security should not point to two unrelated issuers over overlapping dates
a ticker reuse event should create separate master rows, not one continuous history
ADR mappings should be explicit, not inferred from name similarity
the same external identifier should not have overlapping validity windows for unrelated objects

The point is not perfection. It is to catch impossible mappings before they become features.

In Practice

Use these rules:

choose the identity layer before matching
apply effective-date logic on every crosswalk
prefer stable internal keys over raw vendor identifiers
treat ticker joins as suspect until validated against time-valid mappings
audit a sample of joins with concrete security histories, not just aggregate counts

Common Mistakes

Treating the issuer and the security as the same object.
Using a static crosswalk for a historical backtest.
Joining issuer-level data to the wrong share class or ADR.
Assuming a ticker uniquely identifies one firm through time.
Treating identifier success as proof of economic correctness.

Connections

This primer supports Chapter 4's entity-resolution and time-consistent data-pipeline logic. It connects directly to bitemporal data, alternative-data integration, corporate-action handling, and later chapters that depend on historically correct joins rather than today's reference data.

Register to Read

Create Free Account

Already have an account? Sign in

Chapter

4 Fundamental and Alternative Data

More Primers

Vintage Macroeconomic Data and Release-Calendar Alignment XBRL Fundamentals in Practice