DNA Profiling and Sequencing

The ability to read, compare and analyse DNA has transformed biology, medicine and forensic science. OCR A-Level Biology A specification 6.1.3 (a)–(e) requires you to understand how scientists manipulate genomes — beginning with the techniques used to profile individuals, amplify minute samples, and sequence entire genomes. This opening lesson of Module 6.1.3 sets out the molecular toolkit underlying the whole chapter, from the polymerase chain reaction that copies DNA exponentially to the next-generation sequencers that can read a human genome in under a day.

Key Definitions:

DNA profiling — the process of producing an image of the patterns in a person's DNA (their "genetic fingerprint").

VNTR (Variable Number Tandem Repeat) — a non-coding DNA sequence repeated in tandem, with the number of repeats varying between individuals.

STR (Short Tandem Repeat) — a shorter VNTR (2–6 bp repeat units), used routinely in modern forensic DNA profiling.

PCR (Polymerase Chain Reaction) — an in vitro technique for amplifying a specific DNA sequence into millions of copies.

Gel electrophoresis — a technique that separates DNA fragments by size using an electric field.

Sequencing — determining the order of nucleotide bases in a DNA molecule.

Bioinformatics — the use of computing to store, retrieve and analyse biological data, particularly genome and protein sequences.

Why Profile DNA?

Over 99% of the human genome is identical between individuals, yet the remaining <1% contains enough variation to identify a person uniquely (apart from monozygotic twins). Much of this variation lies in non-coding DNA, particularly in regions of repeated sequences. OCR wants you to understand that these non-coding regions — once dismissed as "junk DNA" — are precisely what make profiling possible, because the number of repeats at each locus varies so widely between people.

DNA profiling is used for:

Forensic identification — matching a suspect to a crime scene, or exonerating the innocent.
Paternity testing — comparing alleles between child, mother and alleged father.
Identifying remains — disaster victims, historical figures (e.g. Richard III), missing persons.
Medical diagnosis — detecting genetic disorders before symptoms appear.
Animal breeding and conservation — verifying pedigrees, assessing genetic diversity.

Step 1: Extracting DNA

DNA must first be extracted from a cell sample (blood, saliva, hair root, semen, bone). Cells are lysed with detergent to dissolve membranes, proteins are digested with protease, and DNA is precipitated with cold ethanol. Modern forensic kits automate this using silica columns that bind DNA while contaminants wash through.

Step 2: Amplifying DNA by PCR

Very small samples (a single hair, a speck of dried blood) contain too little DNA to analyse directly. PCR solves this by copying a target sequence exponentially. Each cycle doubles the amount of DNA, so 30 cycles produce over a billion copies from a single molecule.

flowchart TD
    A[Sample DNA + primers + nucleotides + Taq polymerase] --> B[Denaturation 95 degrees C]
    B --> C[Annealing 55 degrees C primers bind]
    C --> D[Extension 72 degrees C Taq synthesises new strand]
    D --> E{Repeat 25-35 cycles}
    E -->|yes| B
    E -->|no| F[Millions of copies]

The PCR Cycle

A PCR reaction contains:

Template DNA — the sample to be copied.
Primers — short single-stranded DNA sequences (about 20 nucleotides) that flank the target region.
DNA nucleotides — the A, T, C and G building blocks.
Taq polymerase — a thermostable DNA polymerase isolated from Thermus aquaticus, a bacterium from hot springs. Its heat resistance is essential because ordinary polymerases are denatured at 95 °C.
Buffer with Mg²⁺ ions.

Stage	Temperature	What happens
Denaturation	95 °C	Hydrogen bonds break; DNA strands separate
Annealing	50–65 °C	Primers bind (hybridise) to complementary sequences flanking the target
Extension	72 °C	Taq polymerase extends the primer, synthesising a new strand 5' → 3'

Exam Tip: OCR often asks why Taq polymerase is used rather than human DNA polymerase. Give two reasons: (1) Taq is not denatured at 95 °C, so it survives repeated heating cycles, and (2) its optimum is about 72 °C, matching the extension temperature.

After 30 cycles, a single starting molecule theoretically becomes 2³⁰ ≈ 10⁹ copies. In practice efficiency is lower, but billions of copies are routine.

Step 3: Gel Electrophoresis

Amplified DNA fragments are separated by size using gel electrophoresis. The gel is a mesh of agarose (for large fragments) or polyacrylamide (for finer resolution). DNA is negatively charged because of its phosphate backbone, so when an electric field is applied, fragments migrate towards the anode (positive electrode). Smaller fragments move faster through the gel matrix, so after a set time fragments are separated by size — small at the far end, large near the wells.

The DNA is visualised by staining (e.g. with ethidium bromide or SYBR green) and viewing under UV light, or by using fluorescently labelled primers that appear in different colours.

Component	Function
Agarose gel	Porous matrix that separates fragments by size
Buffer (TAE or TBE)	Maintains pH and conducts current
Loading dye	Weighs down sample, tracks migration
DNA ladder	Fragments of known size for comparison
Power supply	Creates the electric field

Step 4: From VNTRs to STRs

Early DNA profiling (developed by Sir Alec Jeffreys at Leicester in 1984) used VNTRs — long tandem repeats cut out by restriction enzymes and separated on a gel, producing a pattern of bands resembling a barcode. Modern forensic profiling uses STRs: shorter repeats (e.g. the sequence GATA repeated 6–15 times). The UK National DNA Database uses a set of 17 STR loci plus amelogenin (for sex determination). The chance that two unrelated people share an identical profile at all 17 loci is less than 1 in a billion.

STRs are amplified by PCR using fluorescent primers, and the products are analysed by capillary electrophoresis — a high-resolution form of gel electrophoresis. Each locus produces one or two peaks (homozygous or heterozygous) on a readout called an electropherogram.

DNA Sequencing: The Sanger Method

Sequencing determines the order of bases in a DNA molecule. The original technique, developed by Fred Sanger in 1977, is still used for short reads (up to about 900 bp).

Principle: a mixture of normal dNTPs and a small proportion of fluorescently labelled ddNTPs (dideoxynucleotides) is added to a PCR-like reaction. Whenever a ddNTP is incorporated, chain extension stops because it lacks the 3' OH needed for the next bond. Over millions of molecules, every possible stopping point is represented by fragments of different lengths, each labelled by colour according to the terminating base.

The fragments are separated by capillary electrophoresis and a laser reads the colour of each peak as it passes. The resulting chromatogram gives the sequence directly.

Next-Generation Sequencing (NGS)

Sanger sequencing is slow and expensive at the genome scale. Next-generation sequencing — also called high-throughput sequencing — reads millions of short fragments in parallel. A typical Illumina run produces hundreds of billions of bases in a single day. The Human Genome Project took 13 years and $3 billion (1990–2003) using Sanger methods; today an entire human genome can be sequenced for under £500.

Key advantages of NGS over Sanger:

Speed — parallel reads of millions of fragments simultaneously.
Cost — orders of magnitude cheaper per base.
Coverage — whole genomes, not just selected regions.
Sensitivity — detects rare variants in mixed samples.

Bioinformatics

The flood of sequence data from NGS could not be analysed without bioinformatics — the computational handling of biological data. OCR specifically asks you to understand why bioinformatics is needed.

Bioinformatics is used to:

Assemble short reads into complete genomes by finding overlaps.
Compare sequences to databases (e.g. BLAST at NCBI) to identify genes.
Construct phylogenetic trees from sequence similarities.
Predict protein structure and function.
Identify disease-associated mutations.
Store and share data through public databases (GenBank, Ensembl, UniProt).

Computational Biology and Genomics

Genomics is the study of whole genomes; proteomics is the study of the complete set of proteins (the proteome). Because gene expression varies between tissues and over time, the proteome is much more complex than the genome. Understanding the proteome is the next frontier of personalised medicine.

Worked Example: Paternity Test

At a single STR locus, a mother has alleles 8, 10 (meaning 8 and 10 repeats), a child has alleles 10, 13, and the alleged father has alleles 13, 14. Could he be the father?

The child inherited allele 10 from the mother (consistent with her 8, 10 genotype) and allele 13 from the father. The alleged father has allele 13, so he is consistent with paternity at this locus. In a real test, 15+ loci would be examined; a man is excluded if he lacks an obligate paternal allele at any one locus, and confirmed if he matches at all of them.

Common Exam Mistakes

Confusing denaturation with digestion. PCR denatures DNA by breaking hydrogen bonds, not by cutting it.
Wrong temperature for annealing. Annealing is around 50–65 °C, not 95 °C.
Forgetting why Taq is used. State both thermostability and optimum temperature.
Calling VNTRs "genes". VNTRs are non-coding repetitive sequences.
Mixing up Sanger and NGS. Sanger uses chain-terminating ddNTPs; NGS reads millions of fragments in parallel.
Describing DNA as positive. DNA is negatively charged and migrates to the positive electrode.

Quick Recap

DNA profiling exploits variation in non-coding repeats (VNTRs, STRs) to identify individuals.
PCR amplifies DNA by cycles of denaturation, annealing and extension, using Taq polymerase.
Gel electrophoresis separates DNA fragments by size; small fragments move fastest.
Sanger sequencing uses fluorescent ddNTPs to terminate chains; NGS reads millions of fragments in parallel.
Bioinformatics is essential for assembling, comparing and interpreting sequence data.

Reference: OCR A-Level Biology A (H420) specification 6.1.3 (a)–(e).

DNA Profiling and Sequencing

DNA Profiling and Sequencing

Why Profile DNA?

Step 1: Extracting DNA

Step 2: Amplifying DNA by PCR

The PCR Cycle

Step 3: Gel Electrophoresis

Step 4: From VNTRs to STRs

DNA Sequencing: The Sanger Method

Next-Generation Sequencing (NGS)

Bioinformatics

Computational Biology and Genomics

Worked Example: Paternity Test

Common Exam Mistakes

Quick Recap

More in Biology