The t-Distribution and Small Sample Inference

In real investigations the population standard deviation $\sigma$ is almost never known — it must be estimated from the data by the sample standard deviation $s$ . When the sample is large this barely matters and the normal distribution still serves; but when the sample is small, replacing $\sigma$ by the noisy estimate $s$ injects extra uncertainty that the normal model ignores. The t-distribution is the correct reference distribution in exactly this situation. This lesson derives why, characterises the t-distribution, and applies it through the one-sample, paired and two-sample t-tests.

Where this sits in AQA 7367

This is Paper 3 Statistics (7367/3S) content (Paper 3: 2 h, 100 marks, AO1 40% / AO2 25% / AO3 35%). It is the first genuinely inferential topic of the option: it takes the sampling-distribution result $\bar X \sim N(\mu, \sigma^2/n)$ and adapts it to the realistic case of unknown $\sigma$ . The work is rich in AO2 (choosing the correct reference distribution and stating assumptions) and AO3 (multi-stage tests on real-looking data), with AO1 carrying the arithmetic of the test statistic. It builds on the previous lesson's sampling distribution and on A-Level Maths Statistics hypothesis testing.

Core theory: why the t-distribution?

Start from the standardised sample mean. When $\sigma$ is known, the previous lesson gives

$Z = \frac{\bar X - \mu}{\sigma / \sqrt n} \sim N(0, 1),$

an exact standard normal for a normal population. When $\sigma$ is unknown, the natural move is to substitute the sample standard deviation $s$ :

$T = \frac{\bar X - \mu}{s / \sqrt n} \sim t_{n-1}.$

Crucially this is no longer normal. The numerator $\bar X - \mu$ is still random, but now the denominator is random too — $s$ varies from sample to sample. Dividing one random quantity by another, rather than by the fixed constant $\sigma$ , spreads the statistic out, especially for small $n$ where $s$ is an unreliable estimate of $\sigma$ . To see why this matters, imagine the unlucky case where a small sample happens to give an $s$ much smaller than the true $\sigma$ : the denominator shrinks and $T$ is inflated, producing an extreme value far more often than a fixed denominator ever would. It is precisely these occasional inflated values that thicken the tails of the t relative to the normal. The result is the t-distribution with $n - 1$ degrees of freedom. The " $n-1$ " is the same divisor used in the unbiased sample variance

$s^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i - \bar x)^2,$

and reflects that one degree of freedom is "spent" estimating $\mu$ by $\bar x$ before the spread can be measured.

This is not a minor technicality but the central correction the topic makes. The A-Level Maths approach treats $\sigma$ as a given constant and uses the normal throughout; that is a convenient fiction, because in any genuine investigation $\sigma$ is just as unknown as $\mu$ . Replacing $\sigma$ by $s$ and pretending nothing has changed would systematically understate the uncertainty — your intervals would be too narrow and your tests would reject true null hypotheses too often. The t-distribution is the exact, honest accounting for that extra uncertainty when the data are normal, and it is the default reference distribution for inference about a mean in real statistical practice.

Properties of the t-distribution

Property	Detail
Shape	symmetric, bell-shaped, centred on $0$
Tails	heavier than the normal (more probability far from $0$ )
Parameter	degrees of freedom $\nu = n - 1$
Limit	as $\nu \to \infty$ , $t_\nu \to N(0,1)$
Peak	slightly lower and flatter than $N(0,1)$

The heavier tails are the whole point: because we are less certain about the spread, extreme standardised values are more likely than the normal would predict, so the critical values are larger (you need stronger evidence to reject). As $\nu$ grows, $s$ becomes a reliable estimate of $\sigma$ and the t-distribution tightens onto the normal. The table makes this concrete (two-tailed $5\%$ critical values):

$\nu$	$t_{0.025}$	compare $z_{0.025}$
5	2.571	1.960
10	2.228	1.960
20	2.086	1.960
30	2.042	1.960
120	1.980	1.960

By $\nu = 120$ the t-value $1.980$ is within $1\%$ of the normal $1.960$ — which is why large-sample inference can safely use $z$ . The practical takeaway is a clean decision rule: if $\sigma$ is known, use $z$ at any sample size; if $\sigma$ is unknown, use $t_{n-1}$ , which matters most for small $n$ and fades into $z$ as $n$ grows. A frequent exam scenario deliberately gives a small sample with $\sigma$ unknown precisely to test whether you reach for the t rather than the normal; reaching for $z = 1.96$ there understates the critical value and leads you to reject $H_0$ too readily.

It is also worth being precise about what the degrees of freedom count. Starting with $n$ independent observations, fitting the sample mean $\bar x$ imposes one linear constraint (the deviations $x_i - \bar x$ must sum to zero), so only $n - 1$ of them are free to vary. That lost degree of freedom is exactly the $n-1$ in both $s^2$ and the t-distribution — a single, consistent idea rather than two coincidences. In the two-sample case two means are fitted, costing two degrees of freedom and giving $n_1 + n_2 - 2$ .

The one-sample t-test

Hypotheses. $H_0: \mu = \mu_0$ against $H_1: \mu \ne \mu_0$ (two-tailed), or $H_1: \mu > \mu_0$ / $H_1: \mu < \mu_0$ (one-tailed).

Test statistic and decision.

$T = \frac{\bar x - \mu_0}{s / \sqrt n}, \qquad \text{compare } |T| \text{ with the critical value from } t_{n-1}.$

Reject $H_0$ if $|T|$ exceeds the critical value (using the $\alpha$ tail for one-tailed, the $\alpha/2$ tail for two-tailed).

Worked example — one-sample t-test

A factory claims its bulbs last $1000$ hours on average. A random sample of $12$ bulbs gives $\bar x = 985$ and $s = 30$ . Test at the $5\%$ level whether the mean lifetime is less than $1000$ hours.

$H_0:\ \mu = 1000, \qquad H_1:\ \mu < 1000 \ \text{(one-tailed)}. \quad (\text{B1 hypotheses})$ $T = \frac{985 - 1000}{30/\sqrt{12}} = \frac{-15}{8.6603} = -1.732. \quad (\text{M1 statistic; A1})$

Critical value: $t_{11,\,0.05} = -1.796$ (one-tailed, lower).

$-1.732 > -1.796 \;\Rightarrow\; T \text{ is not in the critical region.} \quad (\text{M1 compare})$

Do not reject $H_0$ : there is insufficient evidence at the $5\%$ level that the mean lifetime is less than $1000$ hours. (B1 hypotheses; M1/A1 test statistic; M1 comparison; A1 contextual conclusion. Note how close it is — $1.732$ vs $1.796$ — so the conclusion is marginal.)

Worked example — a t-test computed from raw data

A new fertiliser is trialled on $7$ plots; the yields (kg) are $20, 23, 19, 24, 21, 22, 25$ . The old fertiliser averaged $20$ kg. Test at the $5\%$ level whether the new mean exceeds $20$ .

$\bar x = \frac{20+23+19+24+21+22+25}{7} = \frac{154}{7} = 22. \quad (\text{M1 mean})$

The deviations $x_i - 22$ are $-2, 1, -3, 2, -1, 0, 3$ , so

$\sum (x_i - \bar x)^2 = 4 + 1 + 9 + 4 + 1 + 0 + 9 = 28, \qquad s^2 = \frac{28}{6} = 4.667, \quad s = 2.160. \quad (\text{M1 } s)$ $H_0:\mu = 20,\ H_1:\mu > 20; \qquad T = \frac{22 - 20}{2.160/\sqrt 7} = \frac{2}{0.8165} = 2.449. \quad (\text{M1; A1})$

Critical value $t_{6,\,0.05} = 1.943$ (one-tailed). Since $2.449 > 1.943$ , reject $H_0$ : there is evidence at the $5\%$ level that the new fertiliser increases mean yield. (M1 $\bar x$ ; M1 $s$ with divisor $n-1 = 6$ ; M1/A1 statistic; A1 conclusion. The whole calculation hinges on dividing by $6$ , not $7$ , in $s^2$ .)

Assumptions for the t-test

The data are a random sample from the population.
The population is normally distributed (or approximately so).
The observations are independent.
The population variance $\sigma^2$ is unknown (otherwise use a $z$ -test).

For small $n$ the normality assumption genuinely matters. For larger $n$ ( $\ge 30$ ) the t-test is robust to moderate non-normality, thanks to the CLT acting on $\bar X$ .

The paired t-test

When data come in natural pairs — before/after on the same subject, or two measurements on one item — the two columns are not independent, so a two-sample test is invalid. Instead reduce each pair to a single difference $d_i = x_i - y_i$ and run a one-sample t-test on the differences with $\mu_0 = 0$ :

$H_0:\ \mu_d = 0, \qquad T = \frac{\bar d}{s_d / \sqrt n} \sim t_{n-1}.$

Worked example — paired t-test

Eight patients have their systolic blood pressure measured before and after a treatment:

Patient	Before	After	$d = \text{B} - \text{A}$
1	148	140	8
2	152	147	5
3	145	142	3
4	160	148	12
5	138	136	2
6	155	150	5
7	142	138	4
8	150	143	7

$\bar d = \frac{8+5+3+12+2+5+4+7}{8} = \frac{46}{8} = 5.75. \quad (\text{M1 mean difference})$

The deviations $d_i - \bar d$ are $2.25, -0.75, -2.75, 6.25, -3.75, -0.75, -1.75, 1.25$ ; squaring and summing,

$\sum (d_i - \bar d)^2 = 5.0625 + 0.5625 + 7.5625 + 39.0625 + 14.0625 + 0.5625 + 3.0625 + 1.5625 = 71.5,$ $s_d = \sqrt{\frac{71.5}{7}} = \sqrt{10.214} = 3.196. \quad (\text{M1 } s_d)$ $T = \frac{5.75}{3.196/\sqrt 8} = \frac{5.75}{1.1300} = 5.088. \quad (\text{M1; A1})$

Critical value (two-tailed $5\%$ , $\nu = 7$ ): $t_{7,\,0.025} = 2.365$ . Since $5.088 > 2.365$ , reject $H_0$ : there is strong evidence that the treatment changes (here, reduces) blood pressure. (M1 each for $\bar d$ , $s_d$ , $T$ ; A1 value; A1 contextual conclusion. Working with the differences is what makes the pairing valid.)

The two-sample t-test (equal variances)

To compare the means of two independent samples from normal populations with unknown but equal variances, pool the two sample variances into a single estimate:

$s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}}, \qquad T = \frac{\bar x_1 - \bar x_2}{s_p\sqrt{\dfrac{1}{n_1} + \dfrac{1}{n_2}}} \sim t_{n_1 + n_2 - 2}.$

The degrees of freedom $n_1 + n_2 - 2$ reflect the two means estimated. The pooled $s_p^2$ is a weighted average of $s_1^2$ and $s_2^2$ , weighted by their degrees of freedom — so the larger sample has more influence on the combined spread estimate. Pooling is justified only when the two populations genuinely share a common variance; that is why the equal-variance assumption must be stated. If the variances are clearly unequal (or the question does not permit the assumption), the pooled test is invalid and a different procedure is needed — a distinction examiners reward candidates for noticing. The three standing assumptions for the pooled test are therefore: both populations approximately normal, equal population variances, and independent samples (which rules out paired data — those go to the one-sample test on differences).

The t-Distribution and Small Sample Inference

The t-Distribution and Small Sample Inference

Where this sits in AQA 7367

Core theory: why the t-distribution?

Properties of the t-distribution

The one-sample t-test

Worked example — one-sample t-test

Worked example — a t-test computed from raw data

Assumptions for the t-test

The paired t-test

Worked example — paired t-test

The two-sample t-test (equal variances)

Worked example — pooled two-sample t-test

More in Mathematics

Patient	Before	After	$d = \text{B} - \text{A}$
1	148	140	8
2	152	147	5
3	145	142	3
4	160	148	12
5	138	136	2
6	155	150	5
7	142	138	4
8	150	143	7

Patient	Before	After	$d = \text{B} - \text{A}$
1	148	140	8
2	152	147	5
3	145	142	3
4	160	148	12
5	138	136	2
6	155	150	5
7	142	138	4
8	150	143	7

Patient	Before	After	$d = \text{B} - \text{A}$
1	148	140	8
2	152	147	5
3	145	142	3
4	160	148	12
5	138	136	2
6	155	150	5
7	142	138	4
8	150	143	7