Classical Statistical Tests as Linear Models: OLS, t-Tests, ANOVA, and Correlation
Many "different" statistical tests are the same linear-model object wearing different notation. Once you see the shared design-matrix view, the jump from classical inference to predictive regularization is much less mysterious.
Classical Statistical Tests as Linear Models: OLS, t-Tests, ANOVA, and Correlation
Many "different" statistical tests are the same linear-model object wearing different notation. Once you see the shared design-matrix view, the jump from classical inference to predictive regularization is much less mysterious.
The Intuition
In applied work, the shift from econometric inference to predictive linear models is easy to misread as a change of subject: first come t-tests and regression tables, then suddenly Ridge, LASSO, and walk-forward prediction. The real continuity is simpler:
- OLS estimates coefficients in a linear model
- the classical pooled two-sample t-test can be written as a coefficient test in a very small linear model
- one-way ANOVA is a coefficient test in a dummy-coded linear model
- Pearson correlation is a slope in standardized regression
The object doing the work is always
$$ y = X\beta + \varepsilon. $$
The variation lies in the design matrix \(X\), the null restriction on \(\beta\), and the assumptions under which the test statistic has a reference distribution.
This organizing view matters for trading because it tells you what survives when you leave the classical setting:
- the design matrix survives
- the coefficient language survives
- the null-hypothesis-as-restriction idea survives
- the exact small-sample reference distributions usually do not
That is the conceptual bridge from "Is the coefficient zero?" to "Should I shrink this coefficient for prediction?"
The Core OLS Object
For a response vector \(y \in \mathbb{R}^n\) and feature matrix \(X \in \mathbb{R}^{n \times p}\), the OLS estimator solves
$$ \hat{\beta} = \arg\min_\beta \|y - X\beta\|_2^2. $$
When \(X^\top X\) is invertible,
$$ \hat{\beta} = (X^\top X)^{-1} X^\top y. $$
Many of the classical procedures that follow are restrictions or re-expressions of this object.
| Procedure | Design matrix idea | Quantity tested |
|---|---|---|
| One-sample t-test | intercept-only model | is mean equal to benchmark? |
| Two-sample t-test | intercept + group dummy | is group coefficient zero? |
| Correlation | simple regression on standardized variables | is slope zero? |
| One-way ANOVA | intercept + \(K-1\) dummies | are all group coefficients zero? |
The inferential question is always a restriction of the form
$$ H_0: R\beta = r, $$
where \(R\) selects the coefficients of interest.
A Two-Group Return Comparison Three Ways
Suppose you compare next-month returns for two sets of stocks: firms in the top profitability quintile and firms in the bottom quintile. Let \(g_i = 1\) for top-quintile firms and \(g_i = 0\) otherwise.
1. Difference in Means
The classical two-sample t-test asks whether
$$ H_0: \mu_1 - \mu_0 = 0. $$
2. Dummy-Variable Regression
Write
$$ r_i = \beta_0 + \beta_1 g_i + \varepsilon_i. $$
Then
- \(\beta_0\) is the mean return for the reference group
- \(\beta_1\) is the difference in means
Testing \(H_0: \beta_1 = 0\) is the same question as the classical pooled two-sample t-test. The Welch version, which adjusts for unequal variances, is not the same OLS coefficient test.
3. Coefficient Restriction
If you fit the regression by OLS and compute the usual coefficient t-statistic
$$ t = \frac{\hat{\beta}_1}{\operatorname{se}(\hat{\beta}_1)}, $$
you recover the same inferential object.
That equivalence is not a trick. The pooled two-sample t-test is a regression with a specific design matrix.
Correlation Is Standardized Regression
Take two variables \(x\) and \(y\). Standardize them:
$$ \tilde{x}_i = \frac{x_i - \bar{x}}{s_x}, \qquad \tilde{y}_i = \frac{y_i - \bar{y}}{s_y}. $$
Now regress \(\tilde{y}\) on \(\tilde{x}\):
$$ \tilde{y}_i = \beta \tilde{x}_i + \varepsilon_i. $$
Because both variables have been centered, the intercept is zero mechanically, so the no-intercept specification is exact here. The OLS slope is the sample Pearson correlation:
$$ \hat{\beta} = \operatorname{Corr}_{\text{Pearson}}(x, y). $$
So the usual correlation test is again a slope test. This is useful because it separates two questions that are often conflated:
- correlation is a standardized linear association
- correlation is not a general dependence measure
When the model uses a linear baseline, this is the scale on which many "simple" relationships are already being judged.
ANOVA Is Dummy-Coded Regression
Suppose you sort firms into three industries and ask whether average returns differ across groups. ANOVA is usually introduced as a decomposition of sums of squares:
$$ \text{TSS} = \text{between-group SS} + \text{within-group SS}. $$
That is true, but the same procedure is just regression with dummy variables:
$$ r_i = \beta_0 + \beta_1 D_{i,1} + \beta_2 D_{i,2} + \varepsilon_i, $$
with one industry omitted as the reference group.
The ANOVA null is
$$ H_0: \beta_1 = \beta_2 = 0, $$
which is a joint restriction. The resulting F-statistic compares the restricted and unrestricted models:
$$ F = \frac{(\text{RSS}_R - \text{RSS}_U)/q}{\text{RSS}_U/(n-p)}, $$
where \(q\) is the number of restrictions.
This is the same logic that later appears in nested-model comparisons and joint feature tests: compare fit lost under a restriction to residual noise left in the unrestricted model.
What Carries into Predictive Modeling
Once you have the linear-model view, regularized prediction is no longer conceptually alien.
Ridge replaces
$$ \min_\beta \|y-X\beta\|_2^2 $$
with
$$ \min_\beta \|y-X\beta\|_2^2 + \lambda \|\beta\|_2^2. $$
LASSO uses an \(L_1\) penalty instead. The design matrix, the response, and the coefficient language are unchanged. What changes is the objective:
- inference asks whether coefficients are distinguishable from zero under a sampling model
- prediction asks whether shrunken coefficients generalize better out of sample
That is why the move from OLS to Ridge does not change the underlying modeling language. The machinery is continuous even when the goal changes, but the classical inferential apparatus does not survive intact: once a penalty is added, the familiar OLS standard errors, p-values, and confidence intervals no longer carry over automatically.
Where the Classical Tests Break
The unifying linear-model view is powerful, but it is not a license to ignore assumptions.
The most important caveats in trading data are:
- Dependence. Returns are often cross-sectionally or temporally dependent.
- Heteroskedasticity. Error variance changes across assets, horizons, or regimes.
- High dimensionality. When \(p\) is large relative to \(n\), classical OLS inference becomes unstable long before prediction methods become unusable.
- Selection after search. A t-statistic reported after trying many signals is not a clean t-statistic anymore.
So "everything is a linear model" is an organizing principle, not a proof that every textbook reference distribution remains valid.
In Practice
Use this map:
- If you compare two portfolio means, think "dummy-variable regression."
- If you compare many group means, think "joint restriction in a dummy-coded regression."
- If you report a correlation, remember it is a slope after standardization.
- If you move to Ridge, LASSO, or logistic regression, keep the design-matrix view and change the objective, not the conceptual language.
The practical payoff is that you stop memorizing isolated tests and start asking better questions:
- what is \(X\)?
- what restriction am I testing?
- what assumptions justify the reported uncertainty?
- what survives when I switch from inference to prediction?
Common Mistakes
- Treating t-tests, correlation tests, and ANOVA as unrelated procedures.
- Forgetting that ANOVA is testing coefficient restrictions, not performing a separate kind of mathematics.
- Reading a statistically significant coefficient as evidence of predictive usefulness.
- Carrying classical p-values into a model-selection pipeline that involved heavy search.
- Forgetting that standardized regression gives correlation only for linear association.
Connections
This primer supports Chapter 11's move from inference to regularized prediction. It connects directly to regularization geometry, logistic regression, multiple testing in the factor zoo, and the broader question of when statistical significance does or does not translate into tradable value.
Register to Read
Sign up for a free account to access all 61 primer articles.
Create Free AccountAlready have an account? Sign in