You are viewing a free preview of this lesson.
Subscribe to unlock all 8 lessons in this course and every other course on LearningBro.
The chi-squared (χ²) test is the standard frequentist tool for comparing categorical biological data against a theoretical expectation. It tells us whether the deviation between what we observed and what a hypothesis predicts is large enough to discount chance as a sufficient explanation. In the context of populations and evolution, the test is most often deployed at A-Level to evaluate Mendelian genetic crosses, to test goodness-of-fit to Hardy-Weinberg expectations, and — through the species-distribution required practical — to assess whether organisms are clumped or randomly distributed across a sampled habitat.
Spec mapping: This lesson sits in AQA 7402 Section 3.7.2 (genetic diversity and adaptation — statistical analysis of inheritance data), with an explicit cross-link to the Required Practical 11 on investigating distribution of species using quadrats and transects (anchored in the ecosystems course, Section 3.7.4). Refer to the official AQA specification document for exact wording. The skills here are also applied in the rest of this course: testing Hardy-Weinberg expectations (lesson 1), interpreting selection experiments (lesson 2), and analysing speciation studies (lesson 4).
Connects to: Mendelian inheritance and dihybrid ratios (Section 3.7.1, course 4 DNA, Genes and Inheritance); Hardy-Weinberg equilibrium (Section 3.7.2, lesson 1 of this course); investigation of species distribution (Section 3.7.4, course 9 Ecosystems).
Biological data are messy. When a cross between two heterozygous pea plants is expected to give a 3:1 ratio, an actual cohort of 100 offspring almost never returns exactly 75 tall and 25 dwarf. Suppose we record 82 tall and 18 dwarf. Is that close enough to 3:1 to be consistent with simple Mendelian inheritance, or is the departure too large to be explained by sampling chance alone — implying something more interesting is happening (linkage, epistasis, viability selection, segregation distortion)?
The eye is a poor judge of "close enough". Humans systematically over-interpret small deviations as meaningful and under-interpret large deviations from rare events. The chi-squared test replaces eyeballing with a quantitative decision rule:
Crucially, the test does not prove anything biologically. It tells you only whether the data are compatible with the null hypothesis. Rejecting the null says "something other than chance is at work"; it does not say what that something is — that is for the biologist to argue from context.
Every chi-squared test begins with a null hypothesis (H₀) and an alternative hypothesis (H₁):
If the test statistic is small (data close to expectation), we fail to reject H₀. Note the careful wording: we never "prove" the null, we simply have no evidence against it. If the test statistic is large, we reject H₀ in favour of H₁ and look for a biological explanation.
Key Point — what "significance" means: A result is "significant at p = 0.05" if the probability of observing a deviation at least this large under the null hypothesis is less than 5%. It does not mean there is a 95% chance H₁ is true. The distinction matters at A-Level and is examined in 9-mark evaluation questions.
The standard form is:
χ² = Σ [(O − E)² / E]
where:
The numerator (O − E)² penalises deviations symmetrically — over-counts and under-counts are equally bad. Dividing by E scales each contribution by the size of the expected count, so a deviation of 10 against an expected 100 contributes less than a deviation of 10 against an expected 20. Squaring before summing prevents positive and negative deviations from cancelling.
A cross between two heterozygous pea plants (Tt × Tt) yields 100 offspring: 82 tall, 18 dwarf. The expected Mendelian ratio is 3:1, so expected counts are 75:25.
| Category | Observed (O) | Expected (E) | O − E | (O − E)² | (O − E)² / E |
|---|---|---|---|---|---|
| Tall | 82 | 75 | 7 | 49 | 0.653 |
| Dwarf | 18 | 25 | −7 | 49 | 1.960 |
| Total | 100 | 100 | χ² = 2.613 |
Note that 82 tall plants out of 100 looks like a meaningful excess over 75. The chi-squared test formalises the intuition that with a sample of only 100, this size of deviation arises by chance roughly 11% of the time — too often to call it significant.
A dihybrid cross (RrYy × RrYy) produces 640 offspring. The expected 9:3:3:1 ratio gives expected counts of 360, 120, 120, 40.
| Category | Observed (O) | Expected (E) | O − E | (O − E)² | (O − E)² / E |
|---|---|---|---|---|---|
| Round yellow | 370 | 360 | 10 | 100 | 0.278 |
| Round green | 124 | 120 | 4 | 16 | 0.133 |
| Wrinkled yellow | 110 | 120 | −10 | 100 | 0.833 |
| Wrinkled green | 36 | 40 | −4 | 16 | 0.400 |
| Total | 640 | 640 | χ² = 1.644 |
A test cross is expected to yield a 1:1:1:1 phenotypic ratio (parental and recombinant in equal proportions). From 200 offspring we observe 90, 16, 14, 80 — with the two parental phenotype classes wildly over-represented and the two recombinant classes correspondingly depleted.
| Category | O | E | O − E | (O − E)² | (O − E)² / E |
|---|---|---|---|---|---|
| Parental A | 90 | 50 | 40 | 1600 | 32.000 |
| Recombinant B | 16 | 50 | −34 | 1156 | 23.120 |
| Recombinant C | 14 | 50 | −36 | 1296 | 25.920 |
| Parental D | 80 | 50 | 30 | 900 | 18.000 |
| Total | 200 | 200 | χ² = 99.040 |
The biological interpretation is that the two loci are linked on the same chromosome — recombination between them happens only rarely, so parental combinations dominate the offspring. Recombination frequency from these data is approximately (16 + 14) / 200 = 0.15, suggesting a map distance of 15 cM between the loci.
df = number of phenotypic categories − 1 (for goodness-of-fit to a stated ratio).
| Number of categories | df | Example ratio |
|---|---|---|
| 2 | 1 | Monohybrid 3:1 |
| 3 | 2 | Epistasis 9:3:4 or 12:3:1 |
| 4 | 3 | Dihybrid 9:3:3:1, or test-cross 1:1:1:1 |
The intuition is that with a fixed total sample size, once you know the counts in all but one category, the remaining count is forced. Only n − 1 of the counts are free to vary.
A subtlety: in contingency-table tests (testing for association between two categorical variables, e.g. handedness against sex), df = (rows − 1) × (columns − 1). This is beyond AQA 7402 but appears in undergraduate statistics.
A-Level misconception watch: Students routinely write "df = number of organisms − 1" or "df = total minus 1" — both wrong. Marks are lost reliably here on examiner-style mark schemes.
The χ² statistic, under H₀, follows a continuous probability distribution whose shape depends on df. As df increases, the distribution becomes broader and shifts to the right; small χ² values become less likely and larger ones less alarming. The right-hand tail of the distribution carries the p-values: the area to the right of an observed χ² is the probability of observing that value or larger if H₀ were true.
The published table records, for each df, the χ² value beyond which the right tail captures a specified probability (typically 0.10, 0.05, 0.02, 0.01, 0.001):
| df | p = 0.10 | p = 0.05 | p = 0.02 | p = 0.01 | p = 0.001 |
|---|---|---|---|---|---|
| 1 | 2.706 | 3.841 | 5.412 | 6.635 | 10.828 |
| 2 | 4.605 | 5.991 | 7.824 | 9.210 | 13.816 |
| 3 | 6.251 | 7.815 | 9.837 | 11.345 | 16.266 |
| 4 | 7.779 | 9.488 | 11.668 | 13.277 | 18.467 |
| 5 | 9.236 | 11.070 | 13.388 | 15.086 | 20.515 |
| 6 | 10.645 | 12.592 | 15.033 | 16.812 | 22.458 |
flowchart TD
A["Compute chi-squared"] --> B["Find df = categories - 1"]
B --> C["Look up critical value at p = 0.05"]
C --> D{"Chi-squared > critical?"}
D -- "Yes" --> E["Reject H0: deviation significant"]
D -- "No" --> F["Accept H0: deviation explainable by chance"]
E --> G["Seek biological explanation"]
F --> H["Data consistent with expected ratio"]
The standard convention in school biology is p = 0.05. p = 0.01 represents a more stringent threshold; p = 0.001 is used in research where many tests are being run and false positives must be guarded against (multiple-testing correction).
The chi-squared technique is also used outside genetics. Required Practical 11 (anchored in AQA 7402 Section 3.7.4 ecosystems — refer to the official AQA specification document for exact wording) involves investigating the distribution of a species using quadrats and transects. The data are categorical (number of organisms in each quadrat, classed into "high density / low density" or by habitat type) and chi-squared tests whether the observed distribution differs significantly from a random or uniform expectation.
The connection to evolution is direct. Patchy distributions create variation in local population sizes, set the stage for founder effects (lesson 3 of this course), and may indicate ongoing selection driven by local microhabitats — themes developed throughout the rest of the course.
Note on RP coverage: Across AQA 7402, there is no dedicated required practical specifically for evolution and speciation. The 12 required practicals are anchored in cells, exchange surfaces, organisms-respond, photosynthesis/respiration, and ecosystems. The chi-squared skills taught in this lesson, however, are exam-relevant for any practical generating categorical count data.
Exam Tip: In specimen questions, show every step: state H₀, tabulate O and E, compute each (O − E)² / E, sum, state df, compare to the critical value, and write a one-sentence biological conclusion. Marks are awarded per step, not for the final number alone.
Question (6 marks): A geneticist crosses two heterozygous Drosophila flies (Gg × Gg) carrying a gene for grey vs ebony body. From the cross, 240 grey-bodied and 60 ebony-bodied flies are recovered. Use the chi-squared test to determine whether these data are consistent with a 3:1 Mendelian ratio. State a clear conclusion in biological terms. (AO1: 1 mark — recall of formula and df; AO2: 3 marks — calculation of expected values, (O − E)² / E for each class, summing to χ²; AO3: 2 marks — comparison to critical value and biological interpretation.)
Grade C response (~150 words):
Expected ratio 3:1 means 225 grey and 75 ebony from 300 flies. χ² = (240 − 225)²/225 + (60 − 75)²/75 = 225/225 + 225/75 = 1 + 3 = 4. df = 1. Critical value at p = 0.05 is 3.841. χ² = 4 is just above the critical value, so we reject the null hypothesis. The data do not fit a 3:1 ratio.
Examiner commentary: This response earns the formula mark (M1), the expected-value mark (M1), the calculation mark (M1) and the df mark (M1) for clear arithmetic. It earns one comparison mark for citing the critical value, but loses marks for not stating H₀ explicitly, not commenting on how close χ² is to the threshold, and giving no biological interpretation of why the deviation might exist. The candidate has also rounded loosely: 4.00 vs 3.841 is a borderline rejection that deserves more nuanced commentary.
Grade A response (~280 words):*
H₀: There is no significant difference between the observed numbers of grey and ebony flies and those expected from a 3:1 Mendelian ratio.
Expected: 300 × 3/4 = 225 grey; 300 × 1/4 = 75 ebony.
Category O E (O − E)² / E Grey 240 225 225/225 = 1.000 Ebony 60 75 225/75 = 3.000 Total χ² = 4.000 df = 2 − 1 = 1. Critical value at p = 0.05 is 3.841; at p = 0.02 it is 5.412.
Since 4.000 > 3.841 (just), we reject H₀ at the 5% level — but the result sits in the narrow window between p = 0.05 and p = 0.02, indicating only marginal significance. A repeat experiment with a larger sample would strengthen the conclusion either way.
If the deviation is real, the excess of grey relative to ebony might reflect viability selection: ebony homozygotes have known fitness reductions in field studies of Drosophila. A follow-up experiment counting embryonic vs adult ratios would test this.
Examiner commentary: This earns all six marks. It states H₀ explicitly (M1), constructs a clean working table (M1, M1), states df and reasons correctly about it (M1), and crucially shows mark-scheme literacy by comparing to two critical values to assess strength of evidence (M1). The synoptic move — proposing a viability-selection explanation drawn from lesson 2 — is the A*-band evaluative move that the Grade C answer omits (M1). The candidate correctly resists overstating a borderline result.
Required Practical 11 generates categorical count data from quadrats placed along a transect or randomly across a sampled habitat. A worked example clarifies how the chi-squared machinery transfers from genetics to ecology.
A student samples a salt-marsh transect and counts the presence of Salicornia europaea (glasswort) in 80 quadrats at four positions along the transect: 0–5 m, 5–10 m, 10–15 m, 15–20 m from the high-tide mark. The observed counts of quadrats containing Salicornia are: 28, 24, 16, 12.
Null hypothesis (H₀): Salicornia is uniformly distributed across the transect — i.e. there is no significant variation with distance from the high-tide mark.
Under H₀, the 80 total occupied quadrats would be distributed equally across the four position bands: expected count = 80 / 4 = 20 per band.
| Position band | O | E | O − E | (O − E)² | (O − E)² / E |
|---|---|---|---|---|---|
| 0–5 m | 28 | 20 | 8 | 64 | 3.20 |
| 5–10 m | 24 | 20 | 4 | 16 | 0.80 |
| 10–15 m | 16 | 20 | −4 | 16 | 0.80 |
| 15–20 m | 12 | 20 | −8 | 64 | 3.20 |
| Total | 80 | 80 | χ² = 8.00 |
The biological interpretation is that Salicornia shows a non-uniform distribution along the transect, with higher abundance closer to the high-tide mark. The mechanism — almost certainly salinity tolerance — is for the student to argue from physiological context, not from the chi-squared test alone. This worked example illustrates the RP11 cross-link explicitly: identical statistical machinery, completely different biological data.
Common error in ecology applications: students sometimes compute expected counts based on total quadrats sampled (80 / 4 = 20 quadrats per band) rather than total quadrats occupied; both are valid analyses but they test slightly different null hypotheses. The first asks "is the species more likely to be found near the tide?", the second asks "given that the species is present somewhere, is it equally likely in each band?". Be explicit about which question is being tested.
A clean chi-squared answer in an examination follows a stereotyped structure that is worth memorising and reproducing line-for-line:
Mark schemes typically allocate one mark per numbered step. A candidate who omits the H₀ or the conclusion can lose two of six marks even with perfectly correct arithmetic. Practice the structure until it becomes automatic.
Examination phrasing — accept and reject correctly: "Reject H₀" is the technically correct phrase when χ² > critical. Many students write "accept H₁" or "the data are significant", which loses the precision mark. "Fail to reject H₀" (when χ² < critical) is preferred to "accept H₀" — see the misconception list above. Use the precise terminology.
Question (6 marks): In a population of 2,000 sweet pea plants surveyed for flower colour (a single locus, two alleles, codominant), 720 are red (CC), 960 are pink (Cc), and 320 are white (cc). Use the chi-squared test to determine whether this population is in Hardy-Weinberg equilibrium. (AO1: 1 mark — formulae; AO2: 3 marks — calculation; AO3: 2 marks — interpretation including statement on equilibrium.)
Grade C response (~140 words):
Count alleles: C = 2(720) + 960 = 2400; c = 2(320) + 960 = 1600. Total = 4000. p(C) = 0.6, q(c) = 0.4. Expected counts: p² × 2000 = 720 CC; 2pq × 2000 = 960 Cc; q² × 2000 = 320 cc. Observed and expected match exactly. χ² = 0. The population is in Hardy-Weinberg equilibrium.
Examiner commentary: Reaches the correct conclusion with clean arithmetic. Earns AO1 (M1), AO2 (M1, M1, M1) and one AO3 mark for stating equilibrium (M1). Loses the second AO3 mark for not commenting on df, not stating H₀ explicitly, and not flagging that "χ² = 0 exactly" is unusual and warrants a comment on whether the result is statistically suspicious — even Hardy-Weinberg-conforming populations show some sampling variation.
Grade A response (~260 words):*
H₀: This population is in Hardy-Weinberg equilibrium for the flower-colour locus — observed genotype frequencies do not differ significantly from those predicted by p² + 2pq + q² = 1.
Allele count: 2(720) + 960 = 2400 C; 2(320) + 960 = 1600 c. Total = 4000. p = 0.6; q = 0.4.
Expected counts under Hardy-Weinberg: CC = 0.36 × 2000 = 720; Cc = 0.48 × 2000 = 960; cc = 0.16 × 2000 = 320.
Class O E (O − E)² / E CC 720 720 0 Cc 960 960 0 cc 320 320 0 Total χ² = 0.000 df = 3 − 1 − 1 = 1 (subtracting an additional df because p was estimated from the data — the Hardy-Weinberg-specific df rule). Critical value at p = 0.05: 3.841. χ² = 0 is far below the critical value, so we fail to reject H₀: the data are consistent with Hardy-Weinberg equilibrium.
Note that χ² = 0 is an exact match, which is unusual for biological data. Either the population is genuinely in undisturbed equilibrium (no measurable selection, drift, gene flow, mutation or non-random mating at this locus), or the dataset has been constructed/idealised for teaching purposes. A real-world replication would expect small non-zero χ² values due to sampling variation, even in a truly equilibrium population.
Examiner commentary: Earns all six marks. H₀ stated explicitly (M1), allele counting shown (M1), expected counts derived (M1), df explained with the Hardy-Weinberg subtraction (M1), conclusion stated (M1), and the A*-band move of flagging "χ² = 0 exactly" as suspiciously clean (M1) — a Fisher-style observation about excessive goodness of fit.
Spec alignment: AQA 7402 Section 3.7.2 (genetic diversity and adaptation), with cross-reference to Section 3.7.4 ecosystems and Required Practical 11. Refer to the official AQA specification document for exact wording.