You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Phylogenetics is the study of evolutionary relationships among organisms. Cladistics is a method of classification that groups organisms based on shared derived characteristics. Together, they provide a powerful framework for understanding the tree of life and have revolutionised how we classify living things.
Phylogeny refers to the evolutionary history and relationships of a group of organisms. A phylogenetic tree (or evolutionary tree) is a branching diagram that represents these relationships, showing how species have diverged from common ancestors over time.
Key features of a phylogenetic tree:
Understanding how to read phylogenetic trees is a critical skill. Several important principles:
Sister groups — Two groups that share an immediate common ancestor are called sister groups. They are more closely related to each other than either is to any other group.
Rotation does not matter — Branches can be rotated around any node without changing the relationships depicted. What matters is the branching pattern, not the left-to-right order of tips.
Recency of common ancestor — The more recently two species shared a common ancestor, the more closely related they are. This is shown by how far back you must trace to find their node.
Monophyletic groups — A group is monophyletic if it includes an ancestor and ALL of its descendants. This is the ideal grouping in cladistics.
Exam Tip: Exam questions often present a phylogenetic tree and ask you to identify which organisms are most closely related. Always look for the most recent common ancestor (the nearest shared node), not for which organisms appear closest together on the page.
Cladistics is a method of classification that groups organisms into clades based on shared derived characteristics (synapomorphies). A clade is a group consisting of an ancestor and all its descendants — a monophyletic group.
| Term | Definition |
|---|---|
| Clade | A group consisting of a common ancestor and all its descendants |
| Cladogram | A branching diagram showing the relationships between clades |
| Synapomorphy | A shared derived characteristic — a feature that evolved in the common ancestor of a clade and is shared by all members |
| Plesiomorphy | An ancestral (primitive) characteristic shared by a broader group, not unique to the clade in question |
| Autapomorphy | A derived characteristic unique to a single taxon |
| Outgroup | A species or group outside the clade of interest, used as a reference point to determine which characters are ancestral and which are derived |
| Parsimony | The principle that the simplest explanation (requiring the fewest evolutionary changes) is preferred |
Modern phylogenetics relies heavily on molecular data — DNA, RNA and protein sequences. Molecular evidence has several advantages over morphological evidence:
| Advantage | Explanation |
|---|---|
| Objectivity | DNA sequences are unambiguous — there is no subjective interpretation of form |
| Universality | All organisms have DNA, RNA and proteins, so molecular comparisons can be made between any two organisms |
| Quantitative | The number of sequence differences can be counted precisely, allowing statistical analysis |
| Large datasets | Genomes contain millions of base pairs, providing vast amounts of data |
| No convergent evolution | At the molecular level, convergent evolution is far less likely than for morphological features |
A molecular clock is based on the principle that mutations accumulate at a roughly constant rate over time for a given gene. By counting the number of differences between homologous sequences from two species, and calibrating with known fossil dates, we can estimate when species diverged.
Limitations of molecular clocks:
Exam Tip: When describing how molecular evidence is used to construct phylogenetic trees, be specific: state that homologous DNA/protein sequences are compared, the number of differences is counted, and more differences indicate a longer time since divergence from a common ancestor.
Sometimes morphological and molecular evidence give different results. Important concepts:
Convergent evolution occurs when unrelated organisms independently evolve similar features in response to similar environmental pressures. Examples:
| Feature | Organisms | Explanation |
|---|---|---|
| Streamlined body shape | Dolphins (mammals) and sharks (fish) | Both evolved in aquatic environments where streamlining reduces drag |
| Wings | Birds, bats (mammals), and insects | Flight evolved independently in these lineages |
| Camera eyes | Vertebrates and octopuses (molluscs) | Complex eyes evolved independently in these very different lineages |
| Spines | Hedgehogs (mammals) and echidnas (monotremes) | Defensive spines evolved independently |
Convergent evolution can mislead morphological classification by making unrelated organisms appear similar. Molecular evidence resolves these problems because convergent evolution at the DNA sequence level is extremely unlikely.
Consider five organisms: lamprey, salmon, lizard, pigeon, and cat. We can classify them using shared derived characters:
| Character | Lamprey | Salmon | Lizard | Pigeon | Cat |
|---|---|---|---|---|---|
| Vertebral column | ✓ | ✓ | ✓ | ✓ | ✓ |
| Jaws | ✗ | ✓ | ✓ | ✓ | ✓ |
| Bony skeleton | ✗ | ✓ | ✓ | ✓ | ✓ |
| Four limbs (tetrapod) | ✗ | ✗ | ✓ | ✓ | ✓ |
| Amniotic egg | ✗ | ✗ | ✓ | ✓ | ✓ |
| Feathers | ✗ | ✗ | ✗ | ✓ | ✗ |
| Fur/hair + mammary glands | ✗ | ✗ | ✗ | ✗ | ✓ |
From this table, we can construct a cladogram:
Each branching point represents the evolution of a new derived character.
Exam Tip: In an exam, you may be given a table of characteristics and asked to construct a cladogram. Start by identifying the outgroup (the organism with the fewest derived characters), then progressively group organisms by shared derived features.
Cladistics has led to significant reclassifications. Notable examples:
Birds are dinosaurs — Cladistic analysis places birds within the theropod dinosaurs. The class "Reptilia" is only valid if it includes birds, making it synonymous with the clade Sauropsida.
Reclassification of whales — Molecular and morphological cladistic evidence shows that whales (Cetacea) are nested within the order Artiodactyla (even-toed ungulates), most closely related to hippos. The combined order is now called Cetartiodactyla.
Fungi are more closely related to animals than plants — Molecular phylogenetics revealed that fungi and animals share a more recent common ancestor than either does with plants.
| Limitation | Explanation |
|---|---|
| Horizontal gene transfer | In prokaryotes especially, genes can transfer between unrelated species, complicating tree-building |
| Hybridisation | When species interbreed, their genomes merge, making it difficult to represent relationships as a simple branching tree |
| Incomplete data | Fossil organisms often lack molecular data; many living species have not been sequenced |
| Long-branch attraction | Rapidly evolving lineages can be artefactually grouped together in analyses |
| Key Concept | Detail |
|---|---|
| Phylogenetics | The study of evolutionary relationships among organisms |
| Cladistics | Classification based on shared derived characteristics (synapomorphies) |
| Clade | An ancestor and all its descendants (monophyletic group) |
| Cladogram | A branching diagram showing clade relationships |
| Molecular phylogenetics | Using DNA/RNA/protein sequences to determine evolutionary relationships |
| Convergent evolution | Unrelated organisms independently evolving similar features |
| Molecular clock | Using mutation accumulation rate to estimate divergence times |
Exam Tip: Cladistics questions often ask you to interpret or construct cladograms. Remember that the key to cladistics is shared DERIVED characters — not shared ancestral characters. An ancestral character shared by all organisms in a group tells you nothing about relationships within that group.
The Edexcel 9BI0 specification places phylogenetics and cladistics within Topic 4: Biodiversity and Natural Resources, with substantial synoptic overlap into the previous lesson on the five-kingdom-to-three-domain reorganisation (which was driven directly by Woese's 16S rRNA phylogenetics — the founding case study), the previous-but-one lesson on classification and taxonomy (the Linnaean hierarchy is the descriptive scaffold whose evolutionary content phylogenetics supplies), the next lessons on biodiversity and species richness and measuring biodiversity (quantifying diversity presupposes a phylogenetic framework that decides what counts as a species and how distinct two species are), Topic 1: Lifestyle, Health and Risk (the DNA double-helix and Watson–Crick base-pairing rules underpin all sequence-based phylogenetics), Topic 5: On the Wild Side (natural selection and molecular evolution generate the sequence variation that phylogenetics reads) and Topic 8: Genetics, Populations, Evolution and Ecosystems (PCR, Sanger sequencing and next-generation sequencing supply the technological pipeline by which molecular phylogenetic data are obtained at scale). The relevant statements concern: defining phylogeny, clade, cladogram, monophyly, paraphyly and polyphyly; explaining how molecular sequence data (DNA, RNA, protein) are used alongside morphology to infer evolutionary relationships; outlining the use of homologous features and the misleading effect of analogous features arising from convergent evolution; and discussing the molecular clock and its limits (refer to the official Pearson Edexcel 9BI0 specification document for exact wording).
Question (8 marks):
A research group sequences a 600-base-pair fragment of the mitochondrial cytochrome c oxidase I (COI) gene — the standard animal "DNA barcoding" locus — from four mammal species. The pairwise number of base differences between the aligned sequences is shown below.
| Species P | Species Q | Species R | Species S | |
|---|---|---|---|---|
| Species P | — | 12 | 78 | 145 |
| Species Q | 12 | — | 80 | 147 |
| Species R | 78 | 80 | — | 152 |
| Species S | 145 | 147 | 152 | — |
Species S is a known marsupial; species P, Q and R are placental mammals.
(a) Construct a phylogenetic tree consistent with these data, identifying the outgroup and explaining your choice. (4)
(b) Discuss two reasons why the divergence times read directly from these sequence differences may misrepresent the true evolutionary timescale, and explain how a researcher would mitigate each. (4)
Solution with mark scheme:
(a) Step 1 — choose the outgroup. Species S is the marsupial; placental and marsupial lineages diverged roughly 160 million years ago, far earlier than any divergence among placentals. The data corroborate the choice: S differs from each of P, Q and R by 145–152 substitutions, the largest pairwise distances in the matrix.
M1 (AO2.1) — identifies S as the outgroup, citing both the biological context (marsupial vs placental) and the matrix evidence (the largest pairwise distances involve S). The outgroup is what allows derived characters to be polarised against the ancestral state.
M1 (AO2.1) — identifies P and Q as a sister-pair clade. Their pairwise distance (12) is far smaller than any other pairwise distance involving P or Q (P–R = 78, P–S = 145), so neighbour-joining or any distance-based clustering joins them first.
M1 (AO2.1) — joins R to the (P, Q) clade next, on evidence that R is closer to (P, Q) (P–R = 78, Q–R = 80, mean 79) than to S (R–S = 152). The internal node grouping (P, Q, R) is supported.
A1 (AO3.1a) — the rooted tree therefore has topology ((,(P,Q),,R),,S) with S as the outgroup. Branch lengths are scaled to substitution counts: short P–Q branch (recent divergence), longer R branch (older placental divergence), longest S branch (marsupial outgroup).
(b) Step 1 — identify and develop two limits.
M1 (AO3.1a) — first limit: mutation rates are not strictly constant ("non-clocklike behaviour"). Mitochondrial rates vary by taxon (rodents accumulate mitochondrial substitutions faster than primates) and by lineage-specific selection on COI. Mitigation: estimate substitution rates per branch using a relaxed molecular clock (e.g. uncorrelated lognormal) rather than a strict clock; calibrate against multiple fossil dates rather than one.
M1 (AO3.1a) — second limit: saturation of substitutions. Once enough time has passed, many sites have mutated multiple times and reverted, so the observed difference underestimates the true number of substitutions. Mitigation: apply a substitution-model correction (e.g. Jukes–Cantor, Kimura two-parameter, GTR + Γ) that infers the expected number of changes given the observed difference, and use slowly evolving genes (e.g. ribosomal RNA) for very deep divergences where COI is saturated.
M1 (AO3.2a) — concludes that the raw distance matrix is a starting point, not an answer: any divergence-time estimate must be reported with confidence intervals from a model-based analysis, not as a single number from sequence-difference counts.
A1 (AO3.2a) — connects method to limit: the four-taxon dataset is also too small to give well-supported deep nodes; modern practice would concatenate dozens of single-copy genes and report bootstrap values (or Bayesian posterior probabilities) at each node to quantify confidence.
Total: 8 marks.
Question (6 marks): Explain how a phylogenetic tree is constructed from molecular sequence data, and discuss why such a tree is best understood as a statistical hypothesis rather than a literal record of evolutionary history.
Mark scheme decomposition by AO:
| Marking point | AO | Credit-worthy content |
|---|---|---|
| 1 | AO1.1 | States the construction pipeline: align homologous sequences, count substitutions, estimate evolutionary distances, cluster taxa using a method such as neighbour-joining or maximum likelihood, root the tree using an outgroup. |
| 2 | AO1.2 | States that branch lengths represent evolutionary distance (substitutions per site or, after calibration, time) and nodes represent inferred common ancestors. |
| 3 | AO2.1 | Applies the principle that more sequence differences indicate longer time since divergence (calibrated by the molecular clock against fossil dates). |
| 4 | AO2.1 | Applies the homology vs homoplasy distinction: only homologous similarity (shared inheritance) gives true phylogenetic signal; convergent evolution (homoplasy) is noise to be filtered. |
| 5 | AO3.1a | Evaluates the "statistical hypothesis" framing: different methods (parsimony, distance, likelihood, Bayesian) and substitution models can produce different trees from the same alignment; bootstrap or posterior probabilities quantify support at each node. |
| 6 | AO3.2a | Concludes by acknowledging that a single-gene tree (a gene tree) may disagree with the species tree due to incomplete lineage sorting and horizontal gene transfer, so a robust species phylogeny rests on many concatenated loci. |
Total: 6 marks split AO1 = 2, AO2 = 2, AO3 = 2. This is a typical Section B "explain and evaluate" question — Edexcel rewards candidates who use the construction pipeline to justify the statistical reading (AO2 + AO3) rather than merely listing the steps (AO1).
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.