You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
The Hardy-Weinberg principle provides a mathematical model for predicting allele and genotype frequencies in a population that is not evolving. It serves as a null model — a baseline against which real populations can be compared. If a population deviates from Hardy-Weinberg equilibrium, we know that one or more evolutionary forces (natural selection, genetic drift, mutation, gene flow, or non-random mating) must be acting. The Edexcel specification requires you to use the Hardy-Weinberg equations to calculate allele and genotype frequencies from population data.
For a gene with two alleles:
p + q = 1
This simply states that the frequencies of all alleles must add up to 1 (100%).
p² + 2pq + q² = 1
Where:
This equation is derived from the expansion of (p + q)², assuming random mating.
The Hardy-Weinberg principle states that allele and genotype frequencies remain constant from generation to generation only if all of the following conditions are met:
flowchart TD
A["Hardy-Weinberg Equilibrium Conditions"] --> B["No natural selection<br/>(all genotypes equally fit)"]
A --> C["No mutation<br/>(no new alleles created)"]
A --> D["No gene flow<br/>(no migration in or out)"]
A --> E["No genetic drift<br/>(infinitely large population)"]
A --> F["Random mating<br/>(no mate preference)"]
G["If ANY condition is violated"] --> H["Allele frequencies WILL change<br/>= Evolution is occurring"]
| Condition | What it means | Real-world reality |
|---|---|---|
| No natural selection | All genotypes have equal fitness | Selection is ubiquitous |
| No mutation | No new alleles arise | Mutations occur constantly (low rate) |
| No gene flow | No immigration or emigration | Most populations exchange individuals |
| No genetic drift | Population is infinitely large | All real populations are finite |
| Random mating | No preference for particular genotypes | Assortative mating is common |
Key insight: No real population meets all five conditions. The Hardy-Weinberg model is an idealised baseline, like Newton's first law assuming no friction. Its value lies in comparison — when real populations deviate from the model, we can identify which evolutionary forces are responsible.
The most common exam approach is:
flowchart LR
A["Know q² (recessive phenotype frequency)"] --> B["q = √q²"]
B --> C["p = 1 − q"]
C --> D["p² = frequency of AA"]
C --> E["2pq = frequency of Aa (carriers)"]
In a European population, approximately 1 in 2,500 individuals is born with cystic fibrosis (an autosomal recessive condition).
Step 1: q² = 1/2500 = 0.0004
Step 2: q = √0.0004 = 0.02
Step 3: p = 1 − 0.02 = 0.98
Step 4: Calculate genotype frequencies:
Key result: The carrier frequency (2pq) is approximately 1 in 25 (3.92%). This is a frequently examined calculation — the carrier frequency is far higher than the disease frequency because most recessive alleles are "hidden" in heterozygotes.
PKU affects 1 in 10,000 newborns in a given population. Calculate the carrier frequency.
Step 1: q² = 1/10,000 = 0.0001
Step 2: q = √0.0001 = 0.01
Step 3: p = 1 − 0.01 = 0.99
Step 4:
In a region of West Africa, 9% of the population has sickle cell anaemia (H^S H^S).
Step 1: q² = 0.09
Step 2: q = √0.09 = 0.30
Step 3: p = 1 − 0.30 = 0.70
Step 4: Genotype frequencies:
Analysis: The sickle cell allele frequency (q = 0.30) is remarkably high for a lethal recessive condition. This is maintained by heterozygote advantage (also called balancing selection) — H^A H^S individuals are resistant to malaria. In Hardy-Weinberg terms, the population is NOT in equilibrium for this gene because natural selection is actively maintaining both alleles.
In a population, 18% of individuals are known to be carriers (heterozygous) for a recessive condition. Find the allele frequencies.
Known: 2pq = 0.18
Using p + q = 1, substitute p = 1 − q: 2(1 − q)(q) = 0.18 2q − 2q² = 0.18 2q² − 2q + 0.18 = 0
Using the quadratic formula: q = [2 ± √(4 − 1.44)] / 4 = [2 ± √2.56] / 4 = [2 ± 1.6] / 4
q = 0.1 or q = 0.9
Both solutions are valid mathematically. The biological context determines which is correct — if the condition is rare, q = 0.1 (recessive allele frequency = 10%), giving p = 0.9.
| Observation | Likely explanation |
|---|---|
| Excess of homozygotes | Non-random mating (inbreeding) or selection against heterozygotes |
| Excess of heterozygotes | Heterozygote advantage (balancing selection) |
| Allele frequencies changing over time | Selection, drift, mutation, or gene flow |
| Different frequencies in subpopulations | Population subdivision, local adaptation, or founder effects |
Hardy-Weinberg is used to estimate carrier frequencies for recessive genetic conditions when only the disease frequency is known. This is essential for genetic counsellors advising prospective parents.
Example problem: A couple from a population where cystic fibrosis affects 1 in 2,500 individuals wants to know their risk of having an affected child. Neither has a family history of CF.
Small, isolated populations are expected to deviate from Hardy-Weinberg equilibrium due to genetic drift. Conservation geneticists use Hardy-Weinberg tests to detect:
| Error | How to avoid it |
|---|---|
| Confusing p and q | Always let q = recessive allele frequency. Start from q². |
| Forgetting to square root | q² is the recessive phenotype frequency; q is the allele frequency. |
| Using percentages instead of decimals | Convert: 4% = 0.04, not 4. |
| Calculating carrier frequency as q instead of 2pq | Carriers are heterozygotes = 2pq, not q. |
| Forgetting that p + q = 1 | If q = 0.3, then p = 0.7, not 0.3. |
| Not converting "1 in X" to a decimal | 1 in 2,500 = 1/2500 = 0.0004. |
| Symbol | Meaning | How to find it |
|---|---|---|
| p | Frequency of dominant allele | p = 1 − q |
| q | Frequency of recessive allele | q = √(q²) |
| p² | Frequency of homozygous dominant | p × p |
| 2pq | Frequency of heterozygous (carriers) | 2 × p × q |
| q² | Frequency of homozygous recessive | Count from phenotype data |
The Hardy-Weinberg principle (p + q = 1 and p² + 2pq + q² = 1) predicts allele and genotype frequencies in a non-evolving population. It requires no selection, no mutation, no gene flow, no genetic drift, and random mating. Deviations indicate evolutionary forces at work. The equations are used to calculate carrier frequencies for genetic conditions, analyse population genetics data, and detect evidence of selection or drift. Always start calculations from q² (the homozygous recessive frequency) and work outward.
The Hardy–Weinberg (H–W) principle is the null model of population genetics — the algebraic statement of what allele and genotype frequencies look like in a population that is not evolving. Its power lies in precisely this: by specifying what equilibrium looks like, it lets us detect evolution by deviation. Under five idealised conditions — no mutation, no migration, no selection, infinite (large) population so no drift, and random mating — genotype frequencies in the next generation are fixed by allele frequencies in the parental gene pool: p² : 2pq : q² for a two-allele autosomal locus, where p + q = 1. Equilibrium is reached in a single generation of random mating and is then maintained indefinitely. As the closing capstone of the origins-of-genetic-variation course, H–W ties every prior lesson together: mutation (lessons 1–2) introduces the alleles whose p and q are tracked; meiosis (lesson 3) shuffles them into gametes whose random fusion generates p² : 2pq : q²; Mendelian genetics (lessons 4–5) supplies the diploid genotype-to-phenotype map that lets us count q² from the recessive phenotype; codominance / multiple alleles (lesson 6) extends H–W to ABO-style three-allele systems; sex linkage (lesson 7) modifies the algebra to q in hemizygous males, q² in homozygous females; the chi-squared test (lesson 8) is the formal machine for testing observed-vs-expected deviations; and selection and drift (lesson 9) are the biological forces whose fingerprints are precisely those deviations. Mastery at A-Level requires fluency with the algebra (q² → q → 2pq under a minute), recognition that H–W is a null model, ability to recite the five assumptions, and synoptic insight to identify which violated assumption explains an observed deviation.
This material sits in Edexcel 9BI0 Topic 8 (Grey Matter — Coordination, Response and Gene Technology) within the population-genetics strand that closes the inheritance-and-evolution sequence. Required knowledge covers: the two H–W equations — p + q = 1 for the allele frequencies of a two-allele autosomal locus, and p² + 2pq + q² = 1 for the genotype frequencies under random mating — along with the algebraic derivation by binomial expansion of (p + q)²; the five conditions for H–W equilibrium — no mutation, no migration (gene flow), no selection, large (effectively infinite) population so no genetic drift, and random mating — and the recognition that no real population satisfies all five exactly, so H–W is an idealised baseline against which deviations are interpreted; the standard calculation chain: identify q² from the homozygous-recessive phenotype frequency, take q = √q², compute p = 1 − q, and derive p² (homozygous dominant) and 2pq (heterozygous "carrier") from there; the biological interpretation of carrier frequency — for rare recessive disease alleles, 2pq is much larger than q² and most disease alleles are hidden in heterozygotes; the reverse direction of the calculation (from carrier frequency back to allele frequency, requiring a quadratic in q); the applications: estimating carrier risk in genetic counselling (e.g. cystic fibrosis), inferring selection when allele frequencies persist at non-equilibrium values (e.g. sickle-cell heterozygote advantage in malaria zones), and detecting drift, inbreeding or population subdivision from excess-homozygote signatures; the interpretation of deviations as the diagnostic signature of evolutionary forces — selection, drift, gene flow, mutation or non-random (assortative) mating — and synoptic links back to lesson 8 (χ² as the formal goodness-of-fit test) and lesson 9 (the biology of the forces that break H–W). The H–W formulae generalise to multi-allele systems (ABO blood groups: p² + q² + r² + 2pq + 2pr + 2qr = 1) and to X-linked loci (gene frequency q in males appears as q in hemizygous males but q² in homozygous females, with male and female allele frequencies converging to the female value over generations under random mating). Refer to the official Pearson Edexcel 9BI0 specification document for exact wording.
Question (8 marks):
(a) In a European population, 1 in 2,500 newborns is affected by cystic fibrosis (CF), an autosomal recessive condition. Assuming the population is in Hardy–Weinberg equilibrium for the CF locus, calculate (i) the frequency of the recessive allele (q), (ii) the frequency of the dominant allele (p), and (iii) the carrier frequency (2pq) expressed as "1 in X". Show your working at each step. (5)
(b) A genetic counsellor advises a couple — neither with a family history of CF — from this population. Using your answers in (a), calculate the probability that any child of this couple is homozygous affected with CF. (2)
(c) The CF allele frequency in northern European populations has remained roughly stable at q ≈ 0.02 for many generations despite homozygotes historically having reduced fitness. State one Hardy–Weinberg assumption that this observation challenges, and identify one biological hypothesis that could account for the persistence of the allele. (1)
Solution with mark scheme:
(a) M1 (AO1.2) — recessive phenotype frequency = q². The frequency of the homozygous-recessive (affected) phenotype is q² = 1/2500 = 0.0004. A1 (AO2.1) — recessive allele frequency. Take the positive square root: q = √0.0004 = 0.02 (allele frequency, dimensionless, between 0 and 1).
M1 (AO2.1) — dominant allele frequency. From p + q = 1: p = 1 − 0.02 = 0.98.
A1 (AO2.2) — carrier frequency. Apply the heterozygote term: 2pq = 2 × 0.98 × 0.02 = 0.0392. Convert to "1 in X": 1 / 0.0392 ≈ 25.5 ≈ 1 in 25. A1 (AO3.1) — biological framing. The carrier frequency (~4%, ~1 in 25) is far higher than the disease frequency (1 in 2,500) because most recessive alleles are hidden in heterozygotes; for a rare recessive allele with q small, 2pq ≈ 2q ≫ q².
(b) M1 (AO2.2) — joint carrier probability. Without family-history information, each parent's prior probability of being a carrier equals the population carrier frequency: P(parent carrier) = 2pq ≈ 1/25. The joint probability that both parents are carriers (assuming independence — random mating) is (1/25) × (1/25) = 1/625. A1 (AO3.1) — child homozygous given both parents carriers. If both parents are heterozygous Cc × Cc, the probability the child is cc (affected) is 1/4 (Mendelian). Overall risk: 1/625 × 1/4 = 1/2500, exactly the population disease frequency, which confirms the H–W consistency of the calculation.
(c) M1 (AO3.2) — assumption challenged + hypothesis. The persistence of a deleterious recessive allele at non-trivially high frequency challenges the no-selection assumption — under simple selection against affected homozygotes the allele should slowly decline. One biological hypothesis to account for persistence: heterozygote advantage (CF carriers may have had historical resistance to specific gastrointestinal pathogens such as cholera or typhoid), so 2pq genotypes have higher fitness than either homozygote, maintaining both alleles at intermediate frequency under balancing selection. (Alternative accepted hypotheses: high mutation rate balancing selection — mutation–selection balance; founder effect followed by drift in small ancestral populations.)
Total: 8 marks (M4 A4).
Question (6 marks): In a region of West Africa, 9% of the population is affected by sickle-cell anaemia (homozygous H^S H^S). Assuming Hardy–Weinberg equilibrium across the three genotypes, calculate the frequencies of the three genotypes (H^A H^A, H^A H^S, H^S H^S), comment on the size of the heterozygote frequency, and explain — in terms of selection and the H–W assumptions — why the allele frequency for H^S persists at this level despite the homozygous condition reducing fitness.
Mark scheme decomposition by AO:
| Mark | AO | Earned by |
|---|---|---|
| 1 | AO2.1 | Identifying q² = 0.09 from the affected (homozygous H^S H^S) frequency, then taking q = √0.09 = 0.30 as the H^S allele frequency |
| 2 | AO2.1 | Computing p = 1 − q = 0.70 as the H^A allele frequency, and p² = 0.49 (49%) as the H^A H^A homozygous-normal frequency |
| 3 | AO2.2 | Calculating 2pq = 2 × 0.70 × 0.30 = 0.42 (42%) as the H^A H^S heterozygote frequency, and verifying the three genotype frequencies sum to 0.49 + 0.42 + 0.09 = 1.00 |
| 4 | AO3.1 | Commenting on the unusually high heterozygote frequency — about 42% of the population carries one H^S allele, far higher than would persist under simple selection against H^S H^S homozygotes |
| 5 | AO3.2 | Identifying heterozygote advantage (balancing selection / overdominance) as the mechanism: H^A H^S heterozygotes are resistant to Plasmodium falciparum malaria (the parasite is impaired in red cells containing some HbS), so heterozygotes have higher fitness than either H^A H^A (susceptible to malaria) or H^S H^S (anaemia) in malaria-endemic regions |
| 6 | AO3.2 | Concluding that the population is NOT in true H–W equilibrium — the no-selection assumption is violated by balancing selection at this locus, but the allele frequencies are at a selection-mutation-balance equilibrium that is locally stable as long as malaria pressure persists; in non-malarial regions H^S declines rapidly |
Total: 6 marks (AO2 = 3, AO3 = 3). Specimen question modelled on the Edexcel 9BI0 paper format. Candidates who compute genotype frequencies but cannot explicitly link heterozygote advantage to violation of the no-selection assumption lose marks 5 and 6.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.