Chapter 4: Fundamental and Alternative Data

Time-Valid Security Masters and Identifier Histories

An identifier match is only useful if it resolves the right object at the right time.

Time-Valid Security Masters and Identifier Histories

An identifier match is only useful if it resolves the right object at the right time.

The Intuition

Financial data does not live on one identity layer.

A single real-world firm can correspond to:

  • a legal entity
  • an issuer
  • several listed securities
  • ADRs in another market
  • futures or options contracts tied to the same underlying

That is why "join on ticker" is not a harmless shortcut. The ticker may have changed, been reused, or point to the wrong share class. Even if the identifier is valid syntactically, the mapping can still be wrong economically.

A security master exists to solve exactly this problem. It stores crosswalks between identity layers and, crucially, makes those crosswalks time-valid.

The Identity Layers

A useful working model has four levels:

Layer What it identifies Typical examples
entity legal or economic organization parent company, subsidiary
issuer issuing firm in capital markets listed company filing with the SEC
security tradable instrument common stock, ADR, bond
contract dated derivative claim futures contract, option series

This is a useful decomposition, not the only canonical ontology. Real vendor graphs and reference-data systems often slice these layers differently.

Many alternative datasets arrive at the entity or issuer level. Returns, however, are measured at the security or contract level. A correct pipeline therefore needs an explicit issuer-to-security mapping rather than an implicit belief that the names "obviously" line up.

Why Time Validity Matters

Mappings change over time:

  • firms rebrand
  • tickers are reused
  • share classes merge or split
  • ADR programs begin or end
  • corporate actions replace one security with another

A static crosswalk is therefore a leakage risk. The right object is not:

which security matches this issuer today?

It is:

which security mapping was valid on the decision date?

Formally, if G(i, t) returns the set of securities linked to issuer i at time t, then any selection rule must be evaluated under an effective-date filter:

$$ s \in G(i, t) \quad \text{only if} \quad \text{effective\_date} \le t < \text{end\_date}. $$

Without that time filter, historical joins silently drift into the future. If the downstream task requires one tradable object, you still need an explicit rule for choosing the primary security from that valid set.

A Worked Example

Suppose you have issuer-level web traffic for a company and want to join it to traded returns.

Bad join

You match the issuer name to the currently active ticker and use that security for the full backtest.

That can fail in several ways:

  • the ticker was reused by a different firm earlier in the sample
  • the current security is an ADR, while the signal belongs to the primary listing
  • the source refers to the parent entity while the return series belongs to a carve-out subsidiary
  • the issuer has multiple active share classes, such as Alphabet's GOOG and GOOGL, and your signal is joined to whichever one your vendor happens to return first

Better join

You store:

  • issuer id
  • security id
  • identifier type and value
  • effective date
  • end date

Then you join the signal only to mappings valid on the decision date.

This is not bureaucracy. It changes the sample and can change the sign of the measured effect.

Why Corporate Actions Complicate "Simple" Joins

Corporate actions are where many silent failures originate.

Share-class problems

Two common share classes may have:

  • different tickers
  • different liquidity
  • different voting rights
  • different prices and return behavior

Joining issuer-level data to the wrong class may preserve the narrative but destroy the tradable object.

ADR problems

An ADR and its ordinary share are economically related but not interchangeable:

  • they trade in different venues
  • they can have different market hours
  • liquidity and costs differ
  • the ADR ratio may not be one-to-one

Contract problems

For futures and options, the contract itself has a maturity. A root symbol is not enough. You need contract-level identity and roll logic.

Identifier Families Are Not Interchangeable

Different identifiers solve different problems:

  • CIK identifies the SEC filer
  • LEI identifies a legal entity
  • FIGI, CUSIP, ISIN, and SEDOL identify securities
  • exchange tickers are human-friendly but operationally fragile

Some of these identifiers also change through time. A CUSIP change strengthens the case for stable internal keys rather than historical joins built from one external code.

An identifier match succeeds only if the identifier family matches the layer you are trying to join.

This is why "identifier available" is not the same as "join solved."

A Minimal Security-Master Schema

A practical security master needs at least:

Field Role
$master_id$ stable internal key
$layer_type$ entity / issuer / security / contract
$identifier_type$ ticker, CUSIP, FIGI, CIK, etc.
$identifier_value$ the raw external identifier
$effective_date$ when the mapping became valid
$end_date$ when the mapping ceased to be valid
$parent_master_id$ optional upward link across layers
source where the mapping came from
venue trading venue or market when relevant
$is_primary$ whether this is the default common or primary listing
$security_role$ ADR, ordinary share, bond, option, future, etc.

You can picture this as a time-valid graph rather than one flat lookup table.

Diagnostics That Catch Silent Misjoins

Good validation checks are simple and brutal:

  • one issuer should not map to two active primary common stocks on the same date without explanation
  • a security should not point to two unrelated issuers over overlapping dates
  • a ticker reuse event should create separate master rows, not one continuous history
  • ADR mappings should be explicit, not inferred from name similarity
  • the same external identifier should not have overlapping validity windows for unrelated objects

The point is not perfection. It is to catch impossible mappings before they become features.

In Practice

Use these rules:

  • choose the identity layer before matching
  • apply effective-date logic on every crosswalk
  • prefer stable internal keys over raw vendor identifiers
  • treat ticker joins as suspect until validated against time-valid mappings
  • audit a sample of joins with concrete security histories, not just aggregate counts

Common Mistakes

  • Treating the issuer and the security as the same object.
  • Using a static crosswalk for a historical backtest.
  • Joining issuer-level data to the wrong share class or ADR.
  • Assuming a ticker uniquely identifies one firm through time.
  • Treating identifier success as proof of economic correctness.

Connections

This primer supports Chapter 4's entity-resolution and time-consistent data-pipeline logic. It connects directly to bitemporal data, alternative-data integration, corporate-action handling, and later chapters that depend on historically correct joins rather than today's reference data.

Register to Read

Sign up for a free account to access all 61 primer articles.

Create Free Account

Already have an account? Sign in