You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
In real investigations the population standard deviation σ is almost never known — it must be estimated from the data by the sample standard deviation s. When the sample is large this barely matters and the normal distribution still serves; but when the sample is small, replacing σ by the noisy estimate s injects extra uncertainty that the normal model ignores. The t-distribution is the correct reference distribution in exactly this situation. This lesson derives why, characterises the t-distribution, and applies it through the one-sample, paired and two-sample t-tests.
This is Paper 3 Statistics (7367/3S) content (Paper 3: 2 h, 100 marks, AO1 40% / AO2 25% / AO3 35%). It is the first genuinely inferential topic of the option: it takes the sampling-distribution result Xˉ∼N(μ,σ2/n) and adapts it to the realistic case of unknown σ. The work is rich in AO2 (choosing the correct reference distribution and stating assumptions) and AO3 (multi-stage tests on real-looking data), with AO1 carrying the arithmetic of the test statistic. It builds on the previous lesson's sampling distribution and on A-Level Maths Statistics hypothesis testing.
Start from the standardised sample mean. When σ is known, the previous lesson gives
Z=σ/nXˉ−μ∼N(0,1),
an exact standard normal for a normal population. When σ is unknown, the natural move is to substitute the sample standard deviation s:
T=s/nXˉ−μ∼tn−1.
Crucially this is no longer normal. The numerator Xˉ−μ is still random, but now the denominator is random too — s varies from sample to sample. Dividing one random quantity by another, rather than by the fixed constant σ, spreads the statistic out, especially for small n where s is an unreliable estimate of σ. To see why this matters, imagine the unlucky case where a small sample happens to give an s much smaller than the true σ: the denominator shrinks and T is inflated, producing an extreme value far more often than a fixed denominator ever would. It is precisely these occasional inflated values that thicken the tails of the t relative to the normal. The result is the t-distribution with n−1 degrees of freedom. The "n−1" is the same divisor used in the unbiased sample variance
s2=n−11∑i=1n(xi−xˉ)2,
and reflects that one degree of freedom is "spent" estimating μ by xˉ before the spread can be measured.
This is not a minor technicality but the central correction the topic makes. The A-Level Maths approach treats σ as a given constant and uses the normal throughout; that is a convenient fiction, because in any genuine investigation σ is just as unknown as μ. Replacing σ by s and pretending nothing has changed would systematically understate the uncertainty — your intervals would be too narrow and your tests would reject true null hypotheses too often. The t-distribution is the exact, honest accounting for that extra uncertainty when the data are normal, and it is the default reference distribution for inference about a mean in real statistical practice.
| Property | Detail |
|---|---|
| Shape | symmetric, bell-shaped, centred on 0 |
| Tails | heavier than the normal (more probability far from 0) |
| Parameter | degrees of freedom ν=n−1 |
| Limit | as ν→∞, tν→N(0,1) |
| Peak | slightly lower and flatter than N(0,1) |
The heavier tails are the whole point: because we are less certain about the spread, extreme standardised values are more likely than the normal would predict, so the critical values are larger (you need stronger evidence to reject). As ν grows, s becomes a reliable estimate of σ and the t-distribution tightens onto the normal. The table makes this concrete (two-tailed 5% critical values):
| ν | t0.025 | compare z0.025 |
|---|---|---|
| 5 | 2.571 | 1.960 |
| 10 | 2.228 | 1.960 |
| 20 | 2.086 | 1.960 |
| 30 | 2.042 | 1.960 |
| 120 | 1.980 | 1.960 |
By ν=120 the t-value 1.980 is within 1% of the normal 1.960 — which is why large-sample inference can safely use z. The practical takeaway is a clean decision rule: if σ is known, use z at any sample size; if σ is unknown, use tn−1, which matters most for small n and fades into z as n grows. A frequent exam scenario deliberately gives a small sample with σ unknown precisely to test whether you reach for the t rather than the normal; reaching for z=1.96 there understates the critical value and leads you to reject H0 too readily.
It is also worth being precise about what the degrees of freedom count. Starting with n independent observations, fitting the sample mean xˉ imposes one linear constraint (the deviations xi−xˉ must sum to zero), so only n−1 of them are free to vary. That lost degree of freedom is exactly the n−1 in both s2 and the t-distribution — a single, consistent idea rather than two coincidences. In the two-sample case two means are fitted, costing two degrees of freedom and giving n1+n2−2.
Hypotheses. H0:μ=μ0 against H1:μ=μ0 (two-tailed), or H1:μ>μ0 / H1:μ<μ0 (one-tailed).
Test statistic and decision.
T=s/nxˉ−μ0,compare ∣T∣ with the critical value from tn−1.
Reject H0 if ∣T∣ exceeds the critical value (using the α tail for one-tailed, the α/2 tail for two-tailed).
A factory claims its bulbs last 1000 hours on average. A random sample of 12 bulbs gives xˉ=985 and s=30. Test at the 5% level whether the mean lifetime is less than 1000 hours.
H0: μ=1000,H1: μ<1000 (one-tailed).(B1 hypotheses) T=30/12985−1000=8.6603−15=−1.732.(M1 statistic; A1)
Critical value: t11,0.05=−1.796 (one-tailed, lower).
−1.732>−1.796⇒T is not in the critical region.(M1 compare)
Do not reject H0: there is insufficient evidence at the 5% level that the mean lifetime is less than 1000 hours. (B1 hypotheses; M1/A1 test statistic; M1 comparison; A1 contextual conclusion. Note how close it is — 1.732 vs 1.796 — so the conclusion is marginal.)
A new fertiliser is trialled on 7 plots; the yields (kg) are 20,23,19,24,21,22,25. The old fertiliser averaged 20 kg. Test at the 5% level whether the new mean exceeds 20.
xˉ=720+23+19+24+21+22+25=7154=22.(M1 mean)
The deviations xi−22 are −2,1,−3,2,−1,0,3, so
∑(xi−xˉ)2=4+1+9+4+1+0+9=28,s2=628=4.667,s=2.160.(M1 s) H0:μ=20, H1:μ>20;T=2.160/722−20=0.81652=2.449.(M1; A1)
Critical value t6,0.05=1.943 (one-tailed). Since 2.449>1.943, reject H0: there is evidence at the 5% level that the new fertiliser increases mean yield. (M1 xˉ; M1 s with divisor n−1=6; M1/A1 statistic; A1 conclusion. The whole calculation hinges on dividing by 6, not 7, in s2.)
For small n the normality assumption genuinely matters. For larger n (≥30) the t-test is robust to moderate non-normality, thanks to the CLT acting on Xˉ.
When data come in natural pairs — before/after on the same subject, or two measurements on one item — the two columns are not independent, so a two-sample test is invalid. Instead reduce each pair to a single difference di=xi−yi and run a one-sample t-test on the differences with μ0=0:
H0: μd=0,T=sd/ndˉ∼tn−1.
Eight patients have their systolic blood pressure measured before and after a treatment:
| Patient | Before | After | d=B−A |
|---|---|---|---|
| 1 | 148 | 140 | 8 |
| 2 | 152 | 147 | 5 |
| 3 | 145 | 142 | 3 |
| 4 | 160 | 148 | 12 |
| 5 | 138 | 136 | 2 |
| 6 | 155 | 150 | 5 |
| 7 | 142 | 138 | 4 |
| 8 | 150 | 143 | 7 |
dˉ=88+5+3+12+2+5+4+7=846=5.75.(M1 mean difference)
The deviations di−dˉ are 2.25,−0.75,−2.75,6.25,−3.75,−0.75,−1.75,1.25; squaring and summing,
∑(di−dˉ)2=5.0625+0.5625+7.5625+39.0625+14.0625+0.5625+3.0625+1.5625=71.5, sd=771.5=10.214=3.196.(M1 sd) T=3.196/85.75=1.13005.75=5.088.(M1; A1)
Critical value (two-tailed 5%, ν=7): t7,0.025=2.365. Since 5.088>2.365, reject H0: there is strong evidence that the treatment changes (here, reduces) blood pressure. (M1 each for dˉ, sd, T; A1 value; A1 contextual conclusion. Working with the differences is what makes the pairing valid.)
To compare the means of two independent samples from normal populations with unknown but equal variances, pool the two sample variances into a single estimate:
sp=n1+n2−2(n1−1)s12+(n2−1)s22,T=spn11+n21xˉ1−xˉ2∼tn1+n2−2.
The degrees of freedom n1+n2−2 reflect the two means estimated. The pooled sp2 is a weighted average of s12 and s22, weighted by their degrees of freedom — so the larger sample has more influence on the combined spread estimate. Pooling is justified only when the two populations genuinely share a common variance; that is why the equal-variance assumption must be stated. If the variances are clearly unequal (or the question does not permit the assumption), the pooled test is invalid and a different procedure is needed — a distinction examiners reward candidates for noticing. The three standing assumptions for the pooled test are therefore: both populations approximately normal, equal population variances, and independent samples (which rules out paired data — those go to the one-sample test on differences).
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.