You are viewing a free preview of this lesson.
Subscribe to unlock all 12 lessons in this course and every other course on LearningBro.
Spec Mapping — OCR H420 Module 6.1.3 — Manipulating genomes, content statements covering DNA profiling, the polymerase chain reaction (PCR), gel electrophoresis, DNA sequencing (Sanger and next-generation), and the role of bioinformatics in interpreting sequence data (refer to the official OCR H420 specification document for exact wording). This lesson opens the molecular-biotechnology arc of Module 6 and supplies the analytical toolkit on which every subsequent lesson — genetic engineering, GMOs, gene therapy, CRISPR — depends.
The ability to read, compare and analyse DNA has transformed biology, medicine and forensic science. OCR A-Level Biology A specification 6.1.3 requires you to understand how scientists manipulate genomes — beginning with the techniques used to profile individuals, amplify minute samples, and sequence entire genomes. This opening lesson of Module 6.1.3 sets out the molecular toolkit underlying the whole chapter, from the polymerase chain reaction that copies DNA exponentially to the next-generation sequencers that can read a human genome in under a day.
The intellectual heritage of this chapter is unusually concentrated. The American biochemist Kary Mullis conceived the polymerase chain reaction in 1983, paraphrased by his contemporaries as a "molecular photocopier" — a way to amplify a vanishingly small DNA target into a workable analytical quantity by harnessing the thermostable polymerase of Thermus aquaticus. Mullis received the Nobel Prize in Chemistry in 1993. The British biochemist Frederick Sanger developed the chain-termination sequencing method in the late 1970s — a paradigm in which dideoxynucleotides act as molecular full stops — and shared the 1980 Nobel Prize for the work. Sanger is also one of only a handful of scientists ever to win two Nobel Prizes (his first, in 1958, was for sequencing insulin). British geneticist Sir Alec Jeffreys at Leicester in 1984 saw that the variable-number tandem repeats lurking in the non-coding genome could form a unique "DNA fingerprint" per individual — a discovery that revolutionised forensic science within five years.
Key Definitions:
- DNA profiling — the process of producing an image of the patterns in a person's DNA (their "genetic fingerprint").
- VNTR (Variable Number Tandem Repeat) — a non-coding DNA sequence repeated in tandem, with the number of repeats varying between individuals.
- STR (Short Tandem Repeat) — a shorter VNTR (2–6 bp repeat units), used routinely in modern forensic DNA profiling.
- PCR (Polymerase Chain Reaction) — an in vitro technique for amplifying a specific DNA sequence into millions of copies.
- Gel electrophoresis — a technique that separates DNA fragments by size using an electric field.
- Sequencing — determining the order of nucleotide bases in a DNA molecule.
- Bioinformatics — the use of computing to store, retrieve and analyse biological data, particularly genome and protein sequences.
Over 99% of the human genome is identical between individuals, yet the remaining <1% contains enough variation to identify a person uniquely (apart from monozygotic twins). Much of this variation lies in non-coding DNA, particularly in regions of repeated sequences. OCR wants you to understand that these non-coding regions — once dismissed as "junk DNA" — are precisely what make profiling possible, because the number of repeats at each locus varies so widely between people.
DNA profiling is used for:
DNA must first be extracted from a cell sample (blood, saliva, hair root, semen, bone). Cells are lysed with detergent to dissolve membranes, proteins are digested with protease, and DNA is precipitated with cold ethanol. Modern forensic kits automate this using silica columns that bind DNA while contaminants wash through.
Very small samples (a single hair, a speck of dried blood) contain too little DNA to analyse directly. PCR solves this by copying a target sequence exponentially. Each cycle doubles the amount of DNA, so 30 cycles produce over a billion copies from a single molecule.
flowchart TD
A[Sample DNA + primers + nucleotides + Taq polymerase] --> B[Denaturation 95 degrees C]
B --> C[Annealing 55 degrees C primers bind]
C --> D[Extension 72 degrees C Taq synthesises new strand]
D --> E{Repeat 25-35 cycles}
E -->|yes| B
E -->|no| F[Millions of copies]
A PCR reaction contains:
| Stage | Temperature | What happens |
|---|---|---|
| Denaturation | 95 °C | Hydrogen bonds break; DNA strands separate |
| Annealing | 50–65 °C | Primers bind (hybridise) to complementary sequences flanking the target |
| Extension | 72 °C | Taq polymerase extends the primer, synthesising a new strand 5' → 3' |
Exam Tip: OCR often asks why Taq polymerase is used rather than human DNA polymerase. Give two reasons: (1) Taq is not denatured at 95 °C, so it survives repeated heating cycles, and (2) its optimum is about 72 °C, matching the extension temperature.
After 30 cycles, a single starting molecule theoretically becomes 2³⁰ ≈ 10⁹ copies. In practice efficiency is lower, but billions of copies are routine.
For an idealised PCR with 100% efficiency the copy number after n cycles is given by:
N=N0⋅2n
where N0 is the starting number of target template molecules and n is the number of completed cycles. With realistic per-cycle efficiency E (where 0<E≤1):
N=N0⋅(1+E)n
For E=0.9 (a typical well-optimised reaction) and n=30 cycles, the amplification factor is approximately 1.930≈2.2×108 — still vast, but two orders of magnitude below the theoretical maximum.
The trace makes clear why a thermostable polymerase is essential: an ordinary mesophilic polymerase would denature irreversibly during the 95 °C step and have to be replenished every cycle, as was the case in the very earliest PCR experiments (Mullis originally used the Klenow fragment of E. coli DNA polymerase I — slow, fragile and expensive). The substitution of Taq polymerase, isolated by Kary Mullis's colleagues from Thermus aquaticus, made PCR practicable.
A forensic technician extracts DNA from a saliva swab and estimates the target-locus copy number at 50 molecules. The PCR programme runs 32 cycles at a per-cycle efficiency of 0.85. How many target copies are produced?
N=50⋅(1+0.85)32=50⋅1.8532
Using logarithms, log10(1.85)≈0.267, so 1.8532≈100.267×32=108.54≈3.5×108. Therefore N≈50×3.5×108≈1.7×1010 copies — comfortably enough for downstream electrophoresis. The same calculation under perfect (100%) efficiency would give 50×232≈2.1×1011, illustrating how cycle efficiency dominates the practical yield.
Amplified DNA fragments are separated by size using gel electrophoresis. The gel is a mesh of agarose (for large fragments) or polyacrylamide (for finer resolution). DNA is negatively charged because of its phosphate backbone, so when an electric field is applied, fragments migrate towards the anode (positive electrode). Smaller fragments move faster through the gel matrix, so after a set time fragments are separated by size — small at the far end, large near the wells.
The DNA is visualised by staining (e.g. with ethidium bromide or SYBR green) and viewing under UV light, or by using fluorescently labelled primers that appear in different colours.
| Component | Function |
|---|---|
| Agarose gel | Porous matrix that separates fragments by size |
| Buffer (TAE or TBE) | Maintains pH and conducts current |
| Loading dye | Weighs down sample, tracks migration |
| DNA ladder | Fragments of known size for comparison |
| Power supply | Creates the electric field |
Early DNA profiling (developed by Sir Alec Jeffreys at Leicester in 1984) used VNTRs — long tandem repeats cut out by restriction enzymes and separated on a gel, producing a pattern of bands resembling a barcode. Modern forensic profiling uses STRs: shorter repeats (e.g. the sequence GATA repeated 6–15 times). The UK National DNA Database uses a set of 17 STR loci plus amelogenin (for sex determination). The chance that two unrelated people share an identical profile at all 17 loci is less than 1 in a billion.
STRs are amplified by PCR using fluorescent primers, and the products are analysed by capillary electrophoresis — a high-resolution form of gel electrophoresis. Each locus produces one or two peaks (homozygous or heterozygous) on a readout called an electropherogram.
Sequencing determines the order of bases in a DNA molecule. The original technique, developed by Fred Sanger in 1977, is still used for short reads (up to about 900 bp).
Principle: a mixture of normal dNTPs and a small proportion of fluorescently labelled ddNTPs (dideoxynucleotides) is added to a PCR-like reaction. Whenever a ddNTP is incorporated, chain extension stops because it lacks the 3' OH needed for the next bond. Over millions of molecules, every possible stopping point is represented by fragments of different lengths, each labelled by colour according to the terminating base.
The fragments are separated by capillary electrophoresis and a laser reads the colour of each peak as it passes. The resulting chromatogram gives the sequence directly.
The output is a chromatogram in which each colour peak represents the identity of the terminating base. Modern automated Sanger sequencers can resolve fragments differing by a single nucleotide up to lengths of about 900 bp.
Sanger sequencing is slow and expensive at the genome scale. Next-generation sequencing — also called high-throughput sequencing — reads millions of short fragments in parallel. A typical Illumina run produces hundreds of billions of bases in a single day. The Human Genome Project took 13 years and $3 billion (1990–2003) using Sanger methods; today an entire human genome can be sequenced for under £500.
Key advantages of NGS over Sanger:
| Platform | Read length | Throughput | Best use |
|---|---|---|---|
| Sanger capillary | up to ~900 bp | ~1 kb/hour | Single-locus validation, plasmid checks |
| Illumina short-read | 150–300 bp | 100s of Gb/run | Whole-genome resequencing, RNA-seq |
| PacBio SMRT | 10–25 kb | 10s of Gb/run | De novo genome assembly, structural variation |
| Oxford Nanopore | 1 kb – 4 Mb | Variable, portable | Field genomics, real-time pathogen ID |
Newer third-generation methods such as Oxford Nanopore directly thread a single DNA molecule through a protein nanopore and infer base identity from minute changes in ionic current — they avoid PCR amplification altogether, give read lengths into the megabase range, and run on a USB-stick-sized device. During the 2014 West African Ebola outbreak, nanopore sequencers were carried into clinics and produced viral sequences within hours of sample collection, illustrating the public-health value of portable real-time sequencing.
The flood of sequence data from NGS could not be analysed without bioinformatics — the computational handling of biological data. OCR specifically asks you to understand why bioinformatics is needed.
Bioinformatics is used to:
Genomics is the study of whole genomes; proteomics is the study of the complete set of proteins (the proteome). Because gene expression varies between tissues and over time, the proteome is much more complex than the genome. Understanding the proteome is the next frontier of personalised medicine.
Pharmacogenomics — the marriage of genomics with drug response — promises to predict individual variation in how patients metabolise medication, based on cytochrome P450 (CYP) gene variants. Metabolomics extends this further to the small-molecule output of cells (metabolites). The OCR specification places the foundational sequencing techniques in 6.1.3 precisely because the same chain-termination and parallel-read platforms feed all four of these "-omics" disciplines.
A genome-wide association study (GWAS) typically compares hundreds of thousands of single-nucleotide polymorphisms (SNPs) across thousands of cases and controls to identify variants statistically associated with a trait or disease. The UK Biobank — a longitudinal study of half a million volunteers genotyped and clinically tracked since 2006 — has become the world's most powerful platform for such studies, illustrating the synergy of sequencing technology with bioinformatic and statistical infrastructure.
At a single STR locus, a mother has alleles 8, 10 (meaning 8 and 10 repeats), a child has alleles 10, 13, and the alleged father has alleles 13, 14. Could he be the father?
The child inherited allele 10 from the mother (consistent with her 8, 10 genotype) and allele 13 from the father. The alleged father has allele 13, so he is consistent with paternity at this locus. In a real test, 15+ loci would be examined; a man is excluded if he lacks an obligate paternal allele at any one locus, and confirmed if he matches at all of them.
Specimen question modelled on the OCR H420 paper format. 6 marks total.
A forensic scientist amplifies a single short tandem repeat (STR) locus from 100 picograms of crime-scene DNA. The PCR programme runs 28 cycles with an estimated per-cycle efficiency of 0.80. (a) Calculate the amplification factor for the 28-cycle run, assuming the same efficiency throughout. [2 marks] (b) Explain why Taq polymerase is preferred over E. coli DNA polymerase I for PCR. [2 marks] (c) Suggest one reason why the realised amplification factor in the laboratory is likely to be lower than the value you calculated in part (a). [2 marks]
AO breakdown: AO1 = 2 marks (Taq's thermostability, standard PCR theory); AO2 = 2 marks (application of the (1+E)n formula); AO3 = 2 marks (analysis of why empirical yield diverges from theory — primer-dimer formation, reagent depletion in late cycles, target sequence secondary structure).
(a) Amplification factor = (1+0.80)28=1.8028. log10(1.80)≈0.255, so 1.8028≈100.255×28=107.14≈1.4×107. (M1, M2) (b) Taq polymerase comes from Thermus aquaticus so it is heat-stable and survives the 95 °C denaturation step. E. coli polymerase would be denatured and would have to be replaced every cycle. (M3) (c) The efficiency drops in later cycles because the dNTPs and primers are used up. (M4)
Examiner commentary: The candidate scores M1 and M2 by setting up the calculation correctly and reaching the correct order of magnitude, though without showing the logarithm step explicitly. M3 gives one solid reason for using Taq; the answer would be stronger if it also stated Taq's optimum temperature of ~72 °C, which matches the extension step. M4 hits a recognised reason for falling efficiency. The answer is competent but lacks the depth needed for the upper bands.
(a) Amplification factor = (1+0.80)28=1.8028. Using log10(1.80)≈0.2553, 1.8028=100.2553×28=107.148≈1.41×107. (M1, M2) (b) Taq polymerase is thermostable (isolated from a thermophilic prokaryote, Thermus aquaticus) so it survives the 95 °C denaturation step without irreversible denaturation of its tertiary structure. Its optimum temperature of approximately 72 °C also matches the extension step, maximising the rate of phosphodiester-bond formation. E. coli polymerase I has an optimum near 37 °C and would be denatured each cycle, requiring fresh enzyme to be added — laborious, expensive, and the basis for why early PCR was impractical at scale. (M3) (c) Real-world per-cycle efficiency decreases as the reaction progresses. In late cycles, dNTPs and primers are partially depleted; the rising product DNA concentration favours product-product reannealing over primer binding; and the buffer's Mg²⁺ activity may shift. Additionally, primer-dimer formation can sequester primers into off-target products, lowering the proportion of useful template extension. The aggregate result is that empirical yields plateau (the "plateau phase") well below the geometric prediction. (M4, M5)
Examiner commentary: This response demonstrates the AO3 depth that distinguishes A* from grade B. M1 and M2 show the calculation set out explicitly with logarithm working. M3 not only names the thermostability point but ties it to Taq's optimum temperature, providing the two-reason answer OCR expects. M4 and M5 give two distinct mechanistic reasons for the empirical drop in efficiency — substrate depletion AND primer-dimer competition — rather than a single hand-waved "the enzymes get tired" answer. This is the kind of mechanistic discrimination examiners reward.
Practical Activity Group anchor: PAG 6 — Chromatography or electrophoresis. OCR centres routinely run a gel-electrophoresis practical separating PCR products or restriction digests, exactly mirroring step 3 of the profiling workflow described here. Students load samples of known and unknown size into wells, run the gel at constant voltage, and stain to visualise bands. The technical objective — relating migration distance inversely to log(fragment length) — gives a quantitative AO2 anchor for the qualitative theory.
For undergraduate study, look into the biophysics of DNA melting (the Marmur–Doty equation relating Tm to GC content and salt concentration), the structural biology of Taq polymerase versus high-fidelity proof-reading polymerases (Pfu, Phusion), and the bioinformatics of next-generation read assembly (de Bruijn graphs, sequence alignment with BLAST and Bowtie). A classic Oxbridge interview prompt: "You have £100,000 to sequence a particular cancer patient's genome. How would you spend it, and what would you do differently if you had only £100?" — a question that probes your understanding of platform trade-offs, exome vs whole-genome, depth of coverage, and bioinformatic costs.
Reference: OCR A-Level Biology A (H420) specification 6.1.3.