Chapter 4: Fundamental and Alternative Data

XBRL Fundamentals in Practice

XBRL is not just tagged accounting data. It is the grammar that determines what a filing fact means, when it applies, and whether it can be compared across firms and time.

XBRL Fundamentals in Practice

XBRL is not just tagged accounting data. It is the grammar that determines what a filing fact means, when it applies, and whether it can be compared across firms and time.

The Intuition

Chapter 4 relies on filing-level fundamentals, but a reader who has only used vendor panels can miss what XBRL is actually doing.

An XBRL fact is more than:

  • a number
  • a company
  • a date

It also has:

  • a concept
  • a reporting period
  • a unit
  • an optional dimensional context
  • filing metadata that determines when the fact became public

That extra structure is why XBRL is powerful. It is also why naive "latest value" extraction often breaks point-in-time correctness.

One practical note matters immediately: most modern SEC filings are Inline XBRL (iXBRL), where the tags are embedded in the human-readable HTML filing rather than delivered only as a separate XML instance document. That changes parsing workflow, but not the conceptual structure of the facts.

The Anatomy of a Fact

At a practical level, an XBRL fact has the form:

$$ (\text{entity}, \text{concept}, \text{period}, \text{unit}, \text{dimensions}, \text{value}). $$

For example:

  • entity: a specific filer
  • concept: RevenueFromContractWithCustomerExcludingAssessedTax
  • period: quarter ending 2025-03-31
  • unit: USD
  • dimensions: maybe business segment or geography
  • value: reported amount

The filing's acceptance timestamp is not part of the accounting fact itself, but it is essential for historical research because it determines when that fact became knowable.

Three Times Matter

For PIT-safe research, keep these distinct:

  • valid time: the period the fact describes
  • filing acceptance time: when the SEC accepted the filing
  • research availability time: when your pipeline can legitimately use that accepted filing

In many pipelines the last two are close, but they should not be conflated by default.

Concepts, Units, and Periods

Three distinctions cause most beginner errors.

Concept

The concept tells you what is being reported. A revenue concept and a cash concept may both be currency numbers but represent different economic objects.

Unit

The unit tells you how it is measured: dollars, shares, percentages, per-share amounts, and so on. Comparing facts with mismatched units is a silent data error.

Period

The period tells you over what horizon the concept applies.

  • balance-sheet facts are usually instant or point-in-time facts
  • income-statement and cash-flow facts are usually duration facts over an interval

This is why a quarterly revenue value and an end-of-quarter cash balance cannot be handled identically.

Dimensions and Why They Matter

Dimensions encode breakdowns such as:

  • business segment
  • geography
  • product line

That flexibility is valuable, but it creates comparability problems. The same firm may report a total fact and also several dimensional slices. If your pipeline does not distinguish the base fact from the dimensional context, you can double count or mix incomparable records.

In practice, this means the question is not just "what tag did I pull?" It is also:

was this the consolidated fact or one slice of it?

Standard Tags, Extensions, and Taxonomy Drift

The U.S. GAAP taxonomy provides standardized concepts, but firms can also file custom extensions.

That creates three recurring issues:

  • two firms may represent the same economics under different tags
  • one firm may change tags across years as taxonomy practice evolves
  • a vendor's standardized panel may silently map away those differences

Taxonomy evolution is not a bug. Reporting standards change. But it means a long history often requires concept mapping across taxonomy versions rather than a blind one-tag extraction rule. In practice, large filing universes contain many extensions for economically similar concepts, so this is the default headache, not an edge case.

A Worked Example

Suppose you want a point-in-time revenue history.

Bad workflow

You query a convenience endpoint that returns the best current match for quarterly revenue and treat it as historical truth.

That can fail because:

  • an amended filing may have replaced the earlier value
  • the concept used in 2016 is not the same tag used in 2024
  • the endpoint may return a "last filed" snapshot rather than the value available on a historical decision date

Better workflow

You reconstruct the history from filing-level facts:

  1. pull filing metadata and acceptance timestamps
  2. parse the tagged fact with its concept, period, unit, and dimensions
  3. store each fact by filing, not just by latest period label
  4. select the value eligible on the decision date

That is the point-in-time-safe version of "quarterly revenue."

Why Filing Metadata Is Part of the Data Model

For research, a usable row is not just the accounting fact. It combines fact metadata, filing metadata, and a research-availability timestamp. The filing acceptance time gives the knowledge time. The period gives the valid time. Chapter 4 depends on that separation.

This is also why the SEC Frames API is useful but insufficient on its own. It is convenient for cross-sectional snapshots, but the "last filed" value is not the same as a fully reconstructible filing history.

An amended filing therefore does not erase the original fact from history. A PIT pipeline stores both and selects whichever filing was knowable on the decision date.

Common Practical Failure Modes

Watch for these:

  • mixing duration and instant facts
  • comparing facts across mismatched units
  • treating extension tags as junk instead of mapping them carefully
  • dropping dimensions without checking whether the retained fact is the consolidated one
  • using standardized vendor panels without knowing how amendments and taxonomy changes were handled

These are not exotic edge cases. They are ordinary XBRL pipeline errors.

In Practice

Use these rules:

  • think in terms of fact plus context, not just tag plus value
  • keep filing acceptance timestamps with every extracted fact
  • distinguish instant and duration concepts explicitly
  • preserve dimensions until you know which view is economically relevant
  • treat taxonomy mapping as a first-class data-engineering problem, not a one-off tag-cleaning task

Common Mistakes

  • Treating XBRL as one clean table of fundamentals.
  • Ignoring units and period type.
  • Using only the latest standardized concept without filing history.
  • Collapsing dimensional and consolidated facts together.
  • Assuming taxonomy drift does not matter for long samples.

Connections

This primer supports Chapter 4's EDGAR and PIT-fundamentals workflow. It connects directly to bitemporal querying, filing-text extraction, vintage-safe macro and fundamental data, and any later chapter that uses accounting facts as historical features.

Register to Read

Sign up for a free account to access all 61 primer articles.

Create Free Account

Already have an account? Sign in