The Chi-Squared Test

The chi-squared (χ²) test is the standard frequentist tool for comparing categorical biological data against a theoretical expectation. It tells us whether the deviation between what we observed and what a hypothesis predicts is large enough to discount chance as a sufficient explanation. In the context of populations and evolution, the test is most often deployed at A-Level to evaluate Mendelian genetic crosses, to test goodness-of-fit to Hardy-Weinberg expectations, and — through the species-distribution required practical — to assess whether organisms are clumped or randomly distributed across a sampled habitat.

Spec mapping: This lesson sits in AQA 7402 Section 3.7.2 (genetic diversity and adaptation — statistical analysis of inheritance data), with an explicit cross-link to the Required Practical 11 on investigating distribution of species using quadrats and transects (anchored in the ecosystems course, Section 3.7.4). Refer to the official AQA specification document for exact wording. The skills here are also applied in the rest of this course: testing Hardy-Weinberg expectations (lesson 1), interpreting selection experiments (lesson 2), and analysing speciation studies (lesson 4).

Connects to: Mendelian inheritance and dihybrid ratios (Section 3.7.1, course 4 DNA, Genes and Inheritance); Hardy-Weinberg equilibrium (Section 3.7.2, lesson 1 of this course); investigation of species distribution (Section 3.7.4, course 9 Ecosystems).

Why Frequentist Statistics in Biology?

Biological data are messy. When a cross between two heterozygous pea plants is expected to give a 3:1 ratio, an actual cohort of 100 offspring almost never returns exactly 75 tall and 25 dwarf. Suppose we record 82 tall and 18 dwarf. Is that close enough to 3:1 to be consistent with simple Mendelian inheritance, or is the departure too large to be explained by sampling chance alone — implying something more interesting is happening (linkage, epistasis, viability selection, segregation distortion)?

The eye is a poor judge of "close enough". Humans systematically over-interpret small deviations as meaningful and under-interpret large deviations from rare events. The chi-squared test replaces eyeballing with a quantitative decision rule:

State a null hypothesis about how the data should be distributed if no biological effect is at work.
Calculate a test statistic that measures the discrepancy between observed and expected counts.
Compare that statistic to a critical value read from tables of the χ² distribution.
Either accept or reject the null hypothesis at a stated significance level.

Crucially, the test does not prove anything biologically. It tells you only whether the data are compatible with the null hypothesis. Rejecting the null says "something other than chance is at work"; it does not say what that something is — that is for the biologist to argue from context.

The Null Hypothesis

Every chi-squared test begins with a null hypothesis (H₀) and an alternative hypothesis (H₁):

H₀: There is no significant difference between the observed and expected frequencies. Any deviation is due to chance.
H₁: There is a significant difference; the deviation is too large to be explained by chance alone.

If the test statistic is small (data close to expectation), we fail to reject H₀. Note the careful wording: we never "prove" the null, we simply have no evidence against it. If the test statistic is large, we reject H₀ in favour of H₁ and look for a biological explanation.

Key Point — what "significance" means: A result is "significant at p = 0.05" if the probability of observing a deviation at least this large under the null hypothesis is less than 5%. It does not mean there is a 95% chance H₁ is true. The distinction matters at A-Level and is examined in 9-mark evaluation questions.

The Chi-Squared Formula

The standard form is:

χ² = Σ [(O − E)² / E]

where:

O is the observed count in each category;
E is the expected count in each category (calculated from the theoretical ratio applied to the total sample size);
Σ denotes summation across all categories.

The numerator (O − E)² penalises deviations symmetrically — over-counts and under-counts are equally bad. Dividing by E scales each contribution by the size of the expected count, so a deviation of 10 against an expected 100 contributes less than a deviation of 10 against an expected 20. Squaring before summing prevents positive and negative deviations from cancelling.

The seven-step procedure

State the null hypothesis explicitly — e.g. "There is no significant difference between the observed numbers and those expected from a 3:1 monohybrid ratio."
Calculate expected values — multiply each expected proportion by the total sample size.
Calculate (O − E) for each category.
Calculate (O − E)² / E for each category.
Sum to obtain χ².
Determine degrees of freedom — df = number of categories − 1 (for goodness-of-fit tests). Note this is not the number of organisms; misreading this is a common A-Level error.
Compare χ² to the critical value at the chosen significance level (almost always p = 0.05 in biology) and state a conclusion in clear biological language.

Worked Example 1 — Monohybrid Cross

A cross between two heterozygous pea plants (Tt × Tt) yields 100 offspring: 82 tall, 18 dwarf. The expected Mendelian ratio is 3:1, so expected counts are 75:25.

Category	Observed (O)	Expected (E)	O − E	(O − E)²	(O − E)² / E
Tall	82	75	7	49	0.653
Dwarf	18	25	−7	49	1.960
Total	100	100			χ² = 2.613

df = 2 − 1 = 1.
Critical value at p = 0.05, df = 1: 3.841.
2.613 < 3.841, so we fail to reject H₀. The data are consistent with a 3:1 ratio; the deviation is plausibly due to chance alone.

Note that 82 tall plants out of 100 looks like a meaningful excess over 75. The chi-squared test formalises the intuition that with a sample of only 100, this size of deviation arises by chance roughly 11% of the time — too often to call it significant.

Worked Example 2 — Dihybrid Cross

A dihybrid cross (RrYy × RrYy) produces 640 offspring. The expected 9:3:3:1 ratio gives expected counts of 360, 120, 120, 40.

Category	Observed (O)	Expected (E)	O − E	(O − E)²	(O − E)² / E
Round yellow	370	360	10	100	0.278
Round green	124	120	4	16	0.133
Wrinkled yellow	110	120	−10	100	0.833
Wrinkled green	36	40	−4	16	0.400
Total	640	640			χ² = 1.644

df = 4 − 1 = 3.
Critical value at p = 0.05, df = 3: 7.815.
1.644 << 7.815, so we fail to reject H₀. The data strongly support independent assortment of the two genes.

Worked Example 3 — Rejecting H₀ (Evidence for Linkage)

A test cross is expected to yield a 1:1:1:1 phenotypic ratio (parental and recombinant in equal proportions). From 200 offspring we observe 90, 16, 14, 80 — with the two parental phenotype classes wildly over-represented and the two recombinant classes correspondingly depleted.

Category	O	E	O − E	(O − E)²	(O − E)² / E
Parental A	90	50	40	1600	32.000
Recombinant B	16	50	−34	1156	23.120
Recombinant C	14	50	−36	1296	25.920
Parental D	80	50	30	900	18.000
Total	200	200			χ² = 99.040

df = 3. Critical value at p = 0.05: 7.815. Critical value at p = 0.001: 16.266.
99.040 vastly exceeds even the p = 0.001 threshold, so we reject H₀ with very high confidence.

The biological interpretation is that the two loci are linked on the same chromosome — recombination between them happens only rarely, so parental combinations dominate the offspring. Recombination frequency from these data is approximately (16 + 14) / 200 = 0.15, suggesting a map distance of 15 cM between the loci.

Degrees of Freedom — The Most-Lost Mark

df = number of phenotypic categories − 1 (for goodness-of-fit to a stated ratio).

Number of categories	df	Example ratio
2	1	Monohybrid 3:1
3	2	Epistasis 9:3:4 or 12:3:1
4	3	Dihybrid 9:3:3:1, or test-cross 1:1:1:1

The intuition is that with a fixed total sample size, once you know the counts in all but one category, the remaining count is forced. Only n − 1 of the counts are free to vary.

A subtlety: in contingency-table tests (testing for association between two categorical variables, e.g. handedness against sex), df = (rows − 1) × (columns − 1). This is beyond AQA 7402 but appears in undergraduate statistics.

A-Level misconception watch: Students routinely write "df = number of organisms − 1" or "df = total minus 1" — both wrong. Marks are lost reliably here on examiner-style mark schemes.

The Chi-Squared Distribution and the Critical-Value Table

The χ² statistic, under H₀, follows a continuous probability distribution whose shape depends on df. As df increases, the distribution becomes broader and shifts to the right; small χ² values become less likely and larger ones less alarming. The right-hand tail of the distribution carries the p-values: the area to the right of an observed χ² is the probability of observing that value or larger if H₀ were true.

The published table records, for each df, the χ² value beyond which the right tail captures a specified probability (typically 0.10, 0.05, 0.02, 0.01, 0.001):

df	p = 0.10	p = 0.05	p = 0.02	p = 0.01	p = 0.001
1	2.706	3.841	5.412	6.635	10.828
2	4.605	5.991	7.824	9.210	13.816
3	6.251	7.815	9.837	11.345	16.266
4	7.779	9.488	11.668	13.277	18.467
5	9.236	11.070	13.388	15.086	20.515
6	10.645	12.592	15.033	16.812	22.458

Interpreting a result

flowchart TD
  A["Compute chi-squared"] --> B["Find df = categories - 1"]
  B --> C["Look up critical value at p = 0.05"]
  C --> D{"Chi-squared > critical?"}
  D -- "Yes" --> E["Reject H0: deviation significant"]
  D -- "No" --> F["Accept H0: deviation explainable by chance"]
  E --> G["Seek biological explanation"]
  F --> H["Data consistent with expected ratio"]

The standard convention in school biology is p = 0.05. p = 0.01 represents a more stringent threshold; p = 0.001 is used in research where many tests are being run and false positives must be guarded against (multiple-testing correction).

Required Practical 11 — Distribution of a Species

The chi-squared technique is also used outside genetics. Required Practical 11 (anchored in AQA 7402 Section 3.7.4 ecosystems — refer to the official AQA specification document for exact wording) involves investigating the distribution of a species using quadrats and transects. The data are categorical (number of organisms in each quadrat, classed into "high density / low density" or by habitat type) and chi-squared tests whether the observed distribution differs significantly from a random or uniform expectation.

The connection to evolution is direct. Patchy distributions create variation in local population sizes, set the stage for founder effects (lesson 3 of this course), and may indicate ongoing selection driven by local microhabitats — themes developed throughout the rest of the course.

Note on RP coverage: Across AQA 7402, there is no dedicated required practical specifically for evolution and speciation. The 12 required practicals are anchored in cells, exchange surfaces, organisms-respond, photosynthesis/respiration, and ecosystems. The chi-squared skills taught in this lesson, however, are exam-relevant for any practical generating categorical count data.

Limitations and Common Errors

Use raw counts, not percentages or proportions. A common error is to convert observed data to percentages before plugging into the formula; this strips out the sample size and breaks the calculation.
Expected values should be ≥ 5 in every category. If any E falls below 5, either combine adjacent categories or increase the sample size. This is because the chi-squared distribution is a continuous approximation to a discrete count distribution; the approximation breaks down for very small expected counts.
The test detects deviation; it does not explain it. A significant χ² shows the data do not fit the expected ratio, but biological reasoning is needed to suggest why (linkage, epistasis, viability selection, gametic incompatibility).
Larger samples are more powerful. With a small sample, the test may fail to detect real deviations (low statistical power) — a Type II error.
Beware of fishing expeditions. Running many chi-squared tests on the same data set inflates the chance of a false positive. A research-quality analysis applies a multiple-testing correction (Bonferroni, Holm); this is beyond AQA 7402 but explicit in undergraduate biostatistics.

Exam Tip: In specimen questions, show every step: state H₀, tabulate O and E, compute each (O − E)² / E, sum, state df, compare to the critical value, and write a one-sentence biological conclusion. Marks are awarded per step, not for the final number alone.

A-Level Misconceptions

Treating "fail to reject H₀" as "H₀ is proved true". The test never proves a null; it only fails to detect departures from it.
Reading the p-value as the probability that the null hypothesis is true. p is the probability of the data (or more extreme) given the null, not the other way round.
Confusing degrees of freedom with sample size.
Quoting the χ² statistic to more decimal places than the critical-value table supports (the table is typically tabulated to 3 d.p.).
Forgetting that chi-squared is one-tailed — only large χ² values are evidence against H₀. A near-zero χ² is not "significant evidence for H₀"; it just means the data are unusually close to expectation, possibly because of unrecorded experimenter bias (a real concern in Mendel's own data, famously analysed by R. A. Fisher).

Specimen Question — Modelled on the AQA paper format

Question (6 marks): A geneticist crosses two heterozygous Drosophila flies (Gg × Gg) carrying a gene for grey vs ebony body. From the cross, 240 grey-bodied and 60 ebony-bodied flies are recovered. Use the chi-squared test to determine whether these data are consistent with a 3:1 Mendelian ratio. State a clear conclusion in biological terms. (AO1: 1 mark — recall of formula and df; AO2: 3 marks — calculation of expected values, (O − E)² / E for each class, summing to χ²; AO3: 2 marks — comparison to critical value and biological interpretation.)

Mid-band response (~150 words):

Expected ratio 3:1 means 225 grey and 75 ebony from 300 flies. χ² = (240 − 225)²/225 + (60 − 75)²/75 = 225/225 + 225/75 = 1 + 3 = 4. df = 1. Critical value at p = 0.05 is 3.841. χ² = 4 is just above the critical value, so we reject the null hypothesis. The data do not fit a 3:1 ratio.

Examiner-style commentary: This response earns the formula mark (M1), the expected-value mark (M1), the calculation mark (M1) and the df mark (M1) for clear arithmetic. It earns one comparison mark for citing the critical value, but loses marks for not stating H₀ explicitly, not commenting on how close χ² is to the threshold, and giving no biological interpretation of why the deviation might exist. The candidate has also rounded loosely: 4.00 vs 3.841 is a borderline rejection that deserves more nuanced commentary.

Top-band response (~280 words):

H₀: There is no significant difference between the observed numbers of grey and ebony flies and those expected from a 3:1 Mendelian ratio.

Expected: 300 × 3/4 = 225 grey; 300 × 1/4 = 75 ebony.

Category O E (O − E)² / E
Grey 240 225 225/225 = 1.000
Ebony 60 75 225/75 = 3.000
Total χ² = 4.000

df = 2 − 1 = 1. Critical value at p = 0.05 is 3.841; at p = 0.02 it is 5.412.

Since 4.000 > 3.841 (just), we reject H₀ at the 5% level — but the result sits in the narrow window between p = 0.05 and p = 0.02, indicating only marginal significance. A repeat experiment with a larger sample would strengthen the conclusion either way.

If the deviation is real, the excess of grey relative to ebony might reflect viability selection: ebony homozygotes have known fitness reductions in field studies of Drosophila. A follow-up experiment counting embryonic vs adult ratios would test this.

Examiner-style commentary: This earns all six marks. It states H₀ explicitly (M1), constructs a clean working table (M1, M1), states df and reasons correctly about it (M1), and crucially shows mark-scheme literacy by comparing to two critical values to assess strength of evidence (M1). The synoptic move — proposing a viability-selection explanation drawn from lesson 2 — is the A*-band evaluative move that the mid-band answer omits (M1). The candidate correctly resists overstating a borderline result.

Category	O	E	(O − E)² / E
Grey	240	225	225/225 = 1.000
Ebony	60	75	225/75 = 3.000
Total			χ² = 4.000

Extended Worked Example — Chi-Squared on Quadrat Data (RP11 cross-link)

Required Practical 11 generates categorical count data from quadrats placed along a transect or randomly across a sampled habitat. A worked example clarifies how the chi-squared machinery transfers from genetics to ecology.

A student samples a salt-marsh transect and counts the presence of Salicornia europaea (glasswort) in 80 quadrats at four positions along the transect: 0–5 m, 5–10 m, 10–15 m, 15–20 m from the high-tide mark. The observed counts of quadrats containing Salicornia are: 28, 24, 16, 12.

Null hypothesis (H₀): Salicornia is uniformly distributed across the transect — i.e. there is no significant variation with distance from the high-tide mark.

Under H₀, the 80 total occupied quadrats would be distributed equally across the four position bands: expected count = 80 / 4 = 20 per band.

Position band	O	E	O − E	(O − E)²	(O − E)² / E
0–5 m	28	20	8	64	3.20
5–10 m	24	20	4	16	0.80
10–15 m	16	20	−4	16	0.80
15–20 m	12	20	−8	64	3.20
Total	80	80			χ² = 8.00

df = 4 − 1 = 3.
Critical value at p = 0.05, df = 3: 7.815.
8.00 > 7.815, so we reject H₀ at the 5% level.

The biological interpretation is that Salicornia shows a non-uniform distribution along the transect, with higher abundance closer to the high-tide mark. The mechanism — almost certainly salinity tolerance — is for the student to argue from physiological context, not from the chi-squared test alone. This worked example illustrates the RP11 cross-link explicitly: identical statistical machinery, completely different biological data.

Common error in ecology applications: students sometimes compute expected counts based on total quadrats sampled (80 / 4 = 20 quadrats per band) rather than total quadrats occupied; both are valid analyses but they test slightly different null hypotheses. The first asks "is the species more likely to be found near the tide?", the second asks "given that the species is present somewhere, is it equally likely in each band?". Be explicit about which question is being tested.

Reporting Conventions and Examination Style

A clean chi-squared answer in an examination follows a stereotyped structure that is worth memorising and reproducing line-for-line:

State H₀ in one sentence using biological language ("There is no significant difference between observed and expected ...").
Tabulate O and E in a 2- or 3-column table; include the (O − E) and (O − E)² / E columns explicitly.
Quote χ² to three decimal places (the precision of the standard tables).
State df and the formula used (categories − 1).
Quote the critical value at p = 0.05 with df explicit.
State whether χ² exceeds the critical value.
State the conclusion in biological language, not just statistical language.

Mark schemes typically allocate one mark per numbered step. A candidate who omits the H₀ or the conclusion can lose two of six marks even with perfectly correct arithmetic. Practice the structure until it becomes automatic.

Examination phrasing — accept and reject correctly: "Reject H₀" is the technically correct phrase when χ² > critical. Many students write "accept H₁" or "the data are significant", which loses the precision mark. "Fail to reject H₀" (when χ² < critical) is preferred to "accept H₀" — see the misconception list above. Use the precise terminology.

Going Further

Undergraduate reading: Whitlock & Schluter, The Analysis of Biological Data (chapters on chi-squared and contingency tables); Sokal & Rohlf, Biometry — the standard reference for biostatistics; Crawley, The R Book for computational implementation.
History of statistics: R. A. Fisher's 1936 re-analysis of Mendel's pea data showed χ² values suspiciously low across many crosses, suggesting Mendel (or an assistant) may have selectively reported data that fitted the theory too well. The episode is a standard ethics case study.
Oxbridge interview prompts:
- "If your chi-squared test is just below the critical value, are you allowed to add a few more data points and re-test? Why or why not?"
- "A geneticist runs 20 separate chi-squared tests on the same dataset and finds one with p < 0.05. Has she discovered something real? Frame your answer in terms of family-wise error rates."
- "Mendel's data are too good to be true on chi-squared grounds. Does that invalidate his laws of segregation and independent assortment? What would invalidate them?"
Beyond chi-squared: for continuous data, t-tests compare means; for ordinal data without normality assumptions, Mann-Whitney U or Wilcoxon rank-sum are used. ANOVA generalises t-tests to multiple groups. Chi-squared is the categorical-count analogue.

Specimen Question 2 — Hardy-Weinberg Goodness-of-Fit (synoptic with lesson 1)

Question (6 marks): In a population of 2,000 sweet pea plants surveyed for flower colour (a single locus, two alleles, codominant), 720 are red (CC), 960 are pink (Cc), and 320 are white (cc). Use the chi-squared test to determine whether this population is in Hardy-Weinberg equilibrium. (AO1: 1 mark — formulae; AO2: 3 marks — calculation; AO3: 2 marks — interpretation including statement on equilibrium.)

Mid-band response (~140 words):

Count alleles: C = 2(720) + 960 = 2400; c = 2(320) + 960 = 1600. Total = 4000. p(C) = 0.6, q(c) = 0.4. Expected counts: p² × 2000 = 720 CC; 2pq × 2000 = 960 Cc; q² × 2000 = 320 cc. Observed and expected match exactly. χ² = 0. The population is in Hardy-Weinberg equilibrium.

Examiner-style commentary: Reaches the correct conclusion with clean arithmetic. Earns AO1 (M1), AO2 (M1, M1, M1) and one AO3 mark for stating equilibrium (M1). Loses the second AO3 mark for not commenting on df, not stating H₀ explicitly, and not flagging that "χ² = 0 exactly" is unusual and warrants a comment on whether the result is statistically suspicious — even Hardy-Weinberg-conforming populations show some sampling variation.

Top-band response (~260 words):

H₀: This population is in Hardy-Weinberg equilibrium for the flower-colour locus — observed genotype frequencies do not differ significantly from those predicted by p² + 2pq + q² = 1.

Allele count: 2(720) + 960 = 2400 C; 2(320) + 960 = 1600 c. Total = 4000. p = 0.6; q = 0.4.

Expected counts under Hardy-Weinberg: CC = 0.36 × 2000 = 720; Cc = 0.48 × 2000 = 960; cc = 0.16 × 2000 = 320.

Class O E (O − E)² / E
CC 720 720 0
Cc 960 960 0
cc 320 320 0
Total χ² = 0.000

df = 3 − 1 − 1 = 1 (subtracting an additional df because p was estimated from the data — the Hardy-Weinberg-specific df rule). Critical value at p = 0.05: 3.841. χ² = 0 is far below the critical value, so we fail to reject H₀: the data are consistent with Hardy-Weinberg equilibrium.

Note that χ² = 0 is an exact match, which is unusual for biological data. Either the population is genuinely in undisturbed equilibrium (no measurable selection, drift, gene flow, mutation or non-random mating at this locus), or the dataset has been constructed/idealised for teaching purposes. A real-world replication would expect small non-zero χ² values due to sampling variation, even in a truly equilibrium population.

Examiner-style commentary: Earns all six marks. H₀ stated explicitly (M1), allele counting shown (M1), expected counts derived (M1), df explained with the Hardy-Weinberg subtraction (M1), conclusion stated (M1), and the A*-band move of flagging "χ² = 0 exactly" as suspiciously clean (M1) — a Fisher-style observation about excessive goodness of fit.

Class	O	E	(O − E)² / E
CC	720	720	0
Cc	960	960	0
cc	320	320	0
Total			χ² = 0.000

Summary

The chi-squared test compares observed categorical counts against expected counts under a stated null hypothesis.
χ² = Σ [(O − E)² / E]; df = categories − 1 for goodness-of-fit tests; df = categories − 1 − parameters estimated for Hardy-Weinberg goodness-of-fit.
At p = 0.05, a value exceeding the critical-value tabulation rejects H₀.
Used at A-Level for Mendelian ratios, Hardy-Weinberg goodness-of-fit, and species-distribution analyses (RP11 cross-link).
Reporting follows a stereotyped 7-step structure; mark schemes award marks per step, not per answer.
The test detects departures from expectation but does not, on its own, explain them; biological reasoning is needed to identify causes such as linkage, selection, or non-random mating.

Spec alignment: AQA 7402 Section 3.7.2 (genetic diversity and adaptation), with cross-reference to Section 3.7.4 ecosystems and Required Practical 11. Refer to the official AQA specification document for exact wording.

The Chi-Squared Test

The Chi-Squared Test

Why Frequentist Statistics in Biology?

The Null Hypothesis

The Chi-Squared Formula

The seven-step procedure

Worked Example 1 — Monohybrid Cross

Worked Example 2 — Dihybrid Cross

Worked Example 3 — Rejecting H₀ (Evidence for Linkage)

Degrees of Freedom — The Most-Lost Mark

The Chi-Squared Distribution and the Critical-Value Table

Interpreting a result

Required Practical 11 — Distribution of a Species

Limitations and Common Errors

A-Level Misconceptions

Specimen Question — Modelled on the AQA paper format

Extended Worked Example — Chi-Squared on Quadrat Data (RP11 cross-link)

Reporting Conventions and Examination Style

Going Further

Specimen Question 2 — Hardy-Weinberg Goodness-of-Fit (synoptic with lesson 1)

Summary

More in Biology