Chi-Squared Goodness of Fit Tests

The chi-squared ( $\chi^2$ ) goodness-of-fit test answers a question that runs through all of applied statistics: does my data actually follow the model I assumed? You have a set of observed frequencies and a candidate distribution — uniform, binomial, Poisson, normal — that predicts expected frequencies. The test measures the total discrepancy with a single statistic and compares it against the $\chi^2$ distribution. This lesson covers the full method with correct degrees-of-freedom bookkeeping, the all-important class-pooling rule, and three fully-worked tests with mark schemes.

1. Where this sits in AQA 7367

This is Paper 3 Statistics option (7367/3S) content (per-paper weighting AO1 40% / AO2 25% / AO3 35%) — the first of two hypothesis-testing lessons (Lesson 10 covers contingency tables). Computing expected frequencies and the test statistic is AO1; the reasoning — choosing the degrees of freedom, deciding when to pool classes, and writing a conclusion in context — is heavily AO2/AO3. The prerequisites are the Poisson and binomial distributions (Lessons 2–3) for generating expected frequencies, and the general logic of hypothesis testing (null/alternative, significance level, critical value) from A-Level Mathematics.

2. Core theory

The idea and the statistic

If the model is correct, observed frequencies $O_i$ should sit close to the expected frequencies $E_i$ ; large gaps are evidence against the model. The test statistic measures the total relative gap:

$X^2 = \sum_i \frac{(O_i - E_i)^2}{E_i}.$

Squaring makes every term non-negative (so over- and under-estimates do not cancel); dividing by $E_i$ weights each gap relative to what was expected (a gap of 5 matters more where 10 was expected than where 100 was). Under $H_0$ , $X^2$ is approximately distributed as chi-squared with $\nu$ degrees of freedom, written $\chi^2_\nu$ . The test is one-tailed (upper): only a large $X^2$ discredits the model.

It is worth pausing on why the statistic takes this particular form. Each term $\tfrac{(O_i - E_i)^2}{E_i}$ is, to a good approximation, a squared standardised residual: if the count in class $i$ behaves like a Poisson (or binomial) variable with mean $E_i$ , its standard deviation is roughly $\sqrt{E_i}$ , so $\tfrac{O_i - E_i}{\sqrt{E_i}}$ is a standardised quantity and its square is $\tfrac{(O_i - E_i)^2}{E_i}$ . Summing $k$ such squared standardised residuals gives, under $H_0$ , something close to a sum of squared standard normals — and a sum of squared independent standard normals is exactly a chi-squared variable. This is the intuition behind the reference distribution, and it also explains the $E_i \ge 5$ rule: the normal approximation to each count is poor when the expected frequency is small, so the chi-squared approximation to the sum breaks down unless the expected frequencies are reasonably large. Understanding this makes the otherwise mysterious pooling rule feel inevitable rather than arbitrary.

Degrees of freedom

$\boxed{\,\nu = (\text{number of classes used}) - 1 - (\text{number of parameters estimated from the data})\,}$

The " $-1$ " is the constraint that $\sum E_i = \sum O_i = n$ (the totals are forced to match). Each parameter you estimate from the sample (rather than being told) costs a further degree of freedom.

Model being tested	Parameters estimated	$\nu$ (with $k$ classes)
Uniform / given distribution	0	$k - 1$
Poisson, $\lambda$ given	0	$k - 1$
Poisson, $\lambda$ estimated by $\bar x$	1	$k - 2$
Binomial, $p$ given	0	$k - 1$
Binomial, $p$ estimated	1	$k - 2$
Normal, $\mu$ and $\sigma^2$ estimated	2	$k - 3$

⚠️ Count $k$ after any pooling (next), and remember an estimated parameter only counts if you estimated it from this data set.

The pooling rule

The chi-squared approximation is unreliable when an expected frequency is small. The standard A-Level rule is: every $E_i$ must be at least 5; if any $E_i < 5$ , combine it with an adjacent class (and add the corresponding observed frequencies) until all pooled expected frequencies reach 5. Then recount $k$ for the degrees of freedom.

The procedure

State $H_0$ (data follow the distribution) and $H_1$ (they do not).
Compute expected frequencies $E_i = n\,P(\text{class }i)$ from the model.
Pool any classes with $E_i < 5$ ; recount $k$ .
Compute $X^2 = \sum \dfrac{(O_i - E_i)^2}{E_i}$ .
Find $\nu = k - 1 - (\text{params estimated})$ .
Read the critical value $\chi^2_\nu$ at the stated significance level (upper tail).
If $X^2 >$ critical value, reject $H_0$ ; otherwise do not reject. Conclude in context.

3. Worked examples with M1/A1 mark scheme

Example 1 — testing a uniform distribution (a fair die)

A die is rolled 120 times:

Face	1	2	3	4	5	6
Observed $O$	15	23	18	25	17	22

$H_0$ : the die is fair (each face has probability $\tfrac16$ ). $H_1$ : the die is not fair. Expected $E = \tfrac{120}{6} = 20$ per face (B1; all $E \ge 5$ , no pooling).

Face	$O$	$E$	$(O-E)^2/E$
1	15	20	$25/20 = 1.25$
2	23	20	$9/20 = 0.45$
3	18	20	$4/20 = 0.20$
4	25	20	$25/20 = 1.25$
5	17	20	$9/20 = 0.45$
6	22	20	$4/20 = 0.20$

$X^2 = 1.25 + 0.45 + 0.20 + 1.25 + 0.45 + 0.20 = 3.80. \quad (\textbf{M1}\ \sum (O-E)^2/E;\ \textbf{A1}\ 3.80)$

Degrees of freedom $\nu = 6 - 1 = 5$ (B1). At the 5% level the critical value is $\chi^2_5 = 11.07$ . Since $3.80 < 11.07$ we do not reject $H_0$ (M1 compare; A1 conclusion): there is no significant evidence at the 5% level that the die is unfair.

Example 2 — testing a Poisson distribution (parameter estimated)

Accidents per day at a junction over 100 days:

Accidents	0	1	2	3	4	5+
Days $O$	30	36	20	8	4	2

$H_0$ : the data follow a Poisson distribution; $H_1$ : they do not.

Estimate $\lambda$ by the sample mean:

$\bar x = \frac{0(30)+1(36)+2(20)+3(8)+4(4)+5(2)}{100} = \frac{0+36+40+24+16+10}{100} = \frac{126}{100} = 1.26. \quad (\textbf{B1})$

Expected frequencies from $\text{Po}(1.26)$ (using $P(r) = \tfrac{\lambda}{r}P(r-1)$ ):

$r$	$P(X=r)$	$E = 100P$
0	$e^{-1.26} = 0.2837$	28.37
1	$1.26\times 0.2837 = 0.3575$	35.75
2	$\tfrac{1.26}{2}\times 0.3575 = 0.2252$	22.52
3	$\tfrac{1.26}{3}\times 0.2252 = 0.0946$	9.46
$\ge 4$	$1 - 0.9610 = 0.0390$	3.90

The class $\ge 4$ has $E = 3.90 < 5$ , so pool it with class 3: pooled $E = 9.46 + 3.90 = 13.36$ , pooled $O = 8 + (4+2) = 14$ (M1 pooling shown). This leaves $k = 4$ classes: $\{0, 1, 2, \ge 3\}$ .

Class	$O$	$E$	$(O-E)^2/E$
0	30	28.37	$2.6569/28.37 = 0.0936$
1	36	35.75	$0.0625/35.75 = 0.0017$
2	20	22.52	$6.3504/22.52 = 0.2820$
$\ge 3$	14	13.36	$0.4096/13.36 = 0.0307$

$X^2 = 0.0936 + 0.0017 + 0.2820 + 0.0307 = 0.408. \quad (\textbf{M1 A1})$

Degrees of freedom: $k = 4$ classes, 1 parameter estimated ( $\lambda$ ), so $\nu = 4 - 1 - 1 = 2$ (B1). At 5%, $\chi^2_2 = 5.991$ . Since $0.408 < 5.991$ , do not reject $H_0$ (A1): the data are consistent with a Poisson model.

Exam Tip: Pool first, then count classes for $\nu$ . Here pooling reduced six raw classes to four, and estimating $\lambda$ removed one more degree of freedom — both steps are common mark-losers if skipped.

Example 3 — a two-class test with Yates' correction

A coin is tossed 100 times, giving 60 heads and 40 tails. Test at 5% whether the coin is fair.

$H_0$ : $p(\text{head}) = 0.5$ ; $H_1$ : $p \ne 0.5$ . Expected: 50 heads, 50 tails. Here $k = 2$ and $\nu = 2 - 1 = 1$ , so apply Yates' continuity correction:

$X^2 = \sum \frac{(|O - E| - 0.5)^2}{E} = \frac{(10 - 0.5)^2}{50} + \frac{(10 - 0.5)^2}{50} = \frac{90.25}{50} + \frac{90.25}{50} = 1.805 + 1.805 = 3.61. \quad (\textbf{M1 A1})$

At 5%, $\chi^2_1 = 3.841$ . Since $3.61 < 3.841$ , do not reject $H_0$ (A1): no significant evidence the coin is biased. (Without Yates' correction $X^2 = 4.00 > 3.841$ would reject — showing why the correction matters when $\nu = 1$ .)

4. Specimen-style exam question

(Specimen-style — not from any real paper.)

A biologist counts the number of seeds germinating in each of 80 trays of 4 seeds, and proposes a $B(4, 0.5)$ model.

Germinating	0	1	2	3	4
Trays $O$	6	18	28	20	8

Test at the 5% level whether $B(4, 0.5)$ fits.

Solution. $H_0$ : the data follow $B(4,0.5)$ ; $H_1$ : they do not. The $B(4,0.5)$ probabilities are $\tfrac{1}{16}, \tfrac{4}{16}, \tfrac{6}{16}, \tfrac{4}{16}, \tfrac{1}{16}$ , so with $n = 80$ :

$r$	$P$	$E = 80P$	$O$	$(O-E)^2/E$
0	1/16	5	6	$1/5 = 0.20$
1	4/16	20	18	$4/20 = 0.20$
2	6/16	30	28	$4/30 = 0.1333$
3	4/16	20	20	$0/20 = 0$
4	1/16	5	8	$9/5 = 1.80$

All $E \ge 5$ , so no pooling. $X^2 = 0.20 + 0.20 + 0.1333 + 0 + 1.80 = 2.333$ . Here $p$ was given (not estimated), so $\nu = 5 - 1 = 4$ ; at 5%, $\chi^2_4 = 9.488$ . Since $2.333 < 9.488$ , do not reject $H_0$ : $B(4, 0.5)$ is an adequate model at the 5% level.

5. Synoptic links

Lessons 2–3 (Poisson/binomial): these supply the expected frequencies; the recurrence $P(r) = \tfrac{\lambda}{r}P(r-1)$ makes the Poisson table quick.
Lesson 10 (contingency tables): the same statistic $\sum (O-E)^2/E$ with a different degrees-of-freedom rule $(r-1)(c-1)$ and a different $E_{ij} = R_iC_j/N$ .
A-Level Maths hypothesis testing: identical logic of $H_0/H_1$ , significance level and critical value — but a goodness-of-fit test is always upper-tailed on $X^2$ , because only a large discrepancy between observed and expected counts is evidence against the model.
Estimation: using $\bar x$ for $\lambda$ , or the sample proportion for $p$ , connects to point estimation and is exactly what costs a degree of freedom; the more parameters you fit to the data, the fewer independent comparisons remain.

Chi-Squared Goodness of Fit Tests

Chi-Squared Goodness of Fit Tests

1. Where this sits in AQA 7367

2. Core theory

The idea and the statistic

Degrees of freedom

The pooling rule

The procedure

3. Worked examples with M1/A1 mark scheme

Example 1 — testing a uniform distribution (a fair die)

Example 2 — testing a Poisson distribution (parameter estimated)

Example 3 — a two-class test with Yates' correction

4. Specimen-style exam question

5. Synoptic links

6. Mark-scheme literacy

More in Mathematics