Chapter 11: The ML Pipeline

Classical Statistical Tests as Linear Models: OLS, t-Tests, ANOVA, and Correlation

Many "different" statistical tests are the same linear-model object wearing different notation. Once you see the shared design-matrix view, the jump from classical inference to predictive regularization is much less mysterious.

Classical Statistical Tests as Linear Models: OLS, t-Tests, ANOVA, and Correlation

Many "different" statistical tests are the same linear-model object wearing different notation. Once you see the shared design-matrix view, the jump from classical inference to predictive regularization is much less mysterious.

The Intuition

In applied work, the shift from econometric inference to predictive linear models is easy to misread as a change of subject: first come t-tests and regression tables, then suddenly Ridge, LASSO, and walk-forward prediction. The real continuity is simpler:

OLS estimates coefficients in a linear model
the classical pooled two-sample t-test can be written as a coefficient test in a very small linear model
one-way ANOVA is a coefficient test in a dummy-coded linear model
Pearson correlation is a slope in standardized regression

The object doing the work is always

$$ y = X\beta + \varepsilon. $$

The variation lies in the design matrix $X$, the null restriction on $\beta$, and the assumptions under which the test statistic has a reference distribution.

This organizing view matters for trading because it tells you what survives when you leave the classical setting:

the design matrix survives
the coefficient language survives
the null-hypothesis-as-restriction idea survives
the exact small-sample reference distributions usually do not

That is the conceptual bridge from "Is the coefficient zero?" to "Should I shrink this coefficient for prediction?"

The Core OLS Object

For a response vector $y \in \mathbb{R}^n$ and feature matrix $X \in \mathbb{R}^{n \times p}$, the OLS estimator solves

$$ \hat{\beta} = \arg\min_\beta \|y - X\beta\|_2^2. $$

When $X^\top X$ is invertible,

$$ \hat{\beta} = (X^\top X)^{-1} X^\top y. $$

Many of the classical procedures that follow are restrictions or re-expressions of this object.

Procedure	Design matrix idea	Quantity tested
One-sample t-test	intercept-only model	is mean equal to benchmark?
Two-sample t-test	intercept + group dummy	is group coefficient zero?
Correlation	simple regression on standardized variables	is slope zero?
One-way ANOVA	intercept + $K-1$ dummies	are all group coefficients zero?

The inferential question is always a restriction of the form

$$ H_0: R\beta = r, $$

where $R$ selects the coefficients of interest.

A Two-Group Return Comparison Three Ways

Suppose you compare next-month returns for two sets of stocks: firms in the top profitability quintile and firms in the bottom quintile. Let $g_i = 1$ for top-quintile firms and $g_i = 0$ otherwise.

1. Difference in Means

The classical two-sample t-test asks whether

$$ H_0: \mu_1 - \mu_0 = 0. $$

2. Dummy-Variable Regression

Write

$$ r_i = \beta_0 + \beta_1 g_i + \varepsilon_i. $$

Then

$\beta_0$ is the mean return for the reference group
$\beta_1$ is the difference in means

Testing $H_0: \beta_1 = 0$ is the same question as the classical pooled two-sample t-test. The Welch version, which adjusts for unequal variances, is not the same OLS coefficient test.

3. Coefficient Restriction

If you fit the regression by OLS and compute the usual coefficient t-statistic

$$ t = \frac{\hat{\beta}_1}{\operatorname{se}(\hat{\beta}_1)}, $$

you recover the same inferential object.

That equivalence is not a trick. The pooled two-sample t-test is a regression with a specific design matrix.

Correlation Is Standardized Regression

Take two variables $x$ and $y$. Standardize them:

$$ \tilde{x}_i = \frac{x_i - \bar{x}}{s_x}, \qquad \tilde{y}_i = \frac{y_i - \bar{y}}{s_y}. $$

Now regress $\tilde{y}$ on $\tilde{x}$:

$$ \tilde{y}_i = \beta \tilde{x}_i + \varepsilon_i. $$

Because both variables have been centered, the intercept is zero mechanically, so the no-intercept specification is exact here. The OLS slope is the sample Pearson correlation:

$$ \hat{\beta} = \operatorname{Corr}_{\text{Pearson}}(x, y). $$

So the usual correlation test is again a slope test. This is useful because it separates two questions that are often conflated:

correlation is a standardized linear association
correlation is not a general dependence measure

When the model uses a linear baseline, this is the scale on which many "simple" relationships are already being judged.

ANOVA Is Dummy-Coded Regression

Suppose you sort firms into three industries and ask whether average returns differ across groups. ANOVA is usually introduced as a decomposition of sums of squares:

$$ \text{TSS} = \text{between-group SS} + \text{within-group SS}. $$

That is true, but the same procedure is just regression with dummy variables:

$$ r_i = \beta_0 + \beta_1 D_{i,1} + \beta_2 D_{i,2} + \varepsilon_i, $$

with one industry omitted as the reference group.

The ANOVA null is

$$ H_0: \beta_1 = \beta_2 = 0, $$

which is a joint restriction. The resulting F-statistic compares the restricted and unrestricted models:

$$ F = \frac{(\text{RSS}_R - \text{RSS}_U)/q}{\text{RSS}_U/(n-p)}, $$

where $q$ is the number of restrictions.

This is the same logic that later appears in nested-model comparisons and joint feature tests: compare fit lost under a restriction to residual noise left in the unrestricted model.

What Carries into Predictive Modeling

Once you have the linear-model view, regularized prediction is no longer conceptually alien.

Ridge replaces

$$ \min_\beta \|y-X\beta\|_2^2 $$

with

$$ \min_\beta \|y-X\beta\|_2^2 + \lambda \|\beta\|_2^2. $$

LASSO uses an $L_1$ penalty instead. The design matrix, the response, and the coefficient language are unchanged. What changes is the objective:

inference asks whether coefficients are distinguishable from zero under a sampling model
prediction asks whether shrunken coefficients generalize better out of sample

That is why the move from OLS to Ridge does not change the underlying modeling language. The machinery is continuous even when the goal changes, but the classical inferential apparatus does not survive intact: once a penalty is added, the familiar OLS standard errors, p-values, and confidence intervals no longer carry over automatically.

Where the Classical Tests Break

The unifying linear-model view is powerful, but it is not a license to ignore assumptions.

The most important caveats in trading data are:

Dependence. Returns are often cross-sectionally or temporally dependent.
Heteroskedasticity. Error variance changes across assets, horizons, or regimes.
High dimensionality. When $p$ is large relative to $n$, classical OLS inference becomes unstable long before prediction methods become unusable.
Selection after search. A t-statistic reported after trying many signals is not a clean t-statistic anymore.

So "everything is a linear model" is an organizing principle, not a proof that every textbook reference distribution remains valid.

In Practice

Use this map:

If you compare two portfolio means, think "dummy-variable regression."
If you compare many group means, think "joint restriction in a dummy-coded regression."
If you report a correlation, remember it is a slope after standardization.
If you move to Ridge, LASSO, or logistic regression, keep the design-matrix view and change the objective, not the conceptual language.

The practical payoff is that you stop memorizing isolated tests and start asking better questions:

what is $X$?
what restriction am I testing?
what assumptions justify the reported uncertainty?
what survives when I switch from inference to prediction?

Common Mistakes

Treating t-tests, correlation tests, and ANOVA as unrelated procedures.
Forgetting that ANOVA is testing coefficient restrictions, not performing a separate kind of mathematics.
Reading a statistically significant coefficient as evidence of predictive usefulness.
Carrying classical p-values into a model-selection pipeline that involved heavy search.
Forgetting that standardized regression gives correlation only for linear association.

Connections

This primer supports Chapter 11's move from inference to regularized prediction. It connects directly to regularization geometry, logistic regression, multiple testing in the factor zoo, and the broader question of when statistical significance does or does not translate into tradable value.

Register to Read

Create Free Account

Already have an account? Sign in

Chapter

11 The ML Pipeline

More Primers

Loss Functions, Error Metrics, and What They Hide