Gene Structure and the Genetic Code

This lesson covers the structure of genes and the nature of the genetic code as required by the Edexcel A-Level Biology specification (9BI0, Topic 7). You need to understand how genes are organised within DNA, the triplet nature of the genetic code, and its key properties.

What Is a Gene?

A gene is a sequence of nucleotides on a DNA molecule that codes for a functional polypeptide (or a functional RNA molecule such as tRNA or rRNA). Genes represent only a small proportion of the total DNA in a eukaryotic cell — in humans, protein-coding sequences account for roughly 1.5% of the genome.

Each gene occupies a specific position on a chromosome called its locus (plural: loci). The two copies of a gene at the same locus on homologous chromosomes are called alleles. Alleles may be identical (homozygous) or different (heterozygous).

Exam Tip: Be precise with definitions. A gene codes for a polypeptide, not necessarily a whole protein — many functional proteins are composed of multiple polypeptide chains coded by different genes (e.g. haemoglobin has two alpha and two beta globin chains).

DNA Structure Recap

DNA is a double-stranded polynucleotide with an antiparallel arrangement. Each nucleotide consists of:

Component	Details
Phosphate group	Links to the 5' carbon of deoxyribose
Deoxyribose sugar	A five-carbon (pentose) sugar
Nitrogenous base	Adenine (A), Thymine (T), Guanine (G) or Cytosine (C)

The two strands are held together by hydrogen bonds between complementary base pairs:

A — T (two hydrogen bonds)
G — C (three hydrogen bonds)

The strands run in opposite directions — one strand runs 5' to 3' and the other 3' to 5'. This antiparallel arrangement is critical for replication and transcription.

The following diagram summarises the key events at the replication fork during DNA replication:

graph TD
    A["Double-Stranded DNA"] -->|"Helicase unwinds"| B["Replication Fork"]
    B --> C["Leading Strand<br/>(continuous, 5’→3’)"]
    B --> D["Lagging Strand<br/>(Okazaki fragments, 3’→5’)"]
    C --> E["DNA Polymerase III"]
    D --> E
    E -->|"Proofreading"| F["Two Identical<br/>DNA Molecules"]

The Triplet Code

The genetic code is a triplet code — each sequence of three consecutive nucleotide bases on the coding (sense) strand of DNA specifies one amino acid. These triplets are called codons when referring to mRNA.

With four bases and three positions per codon, there are 4 × 4 × 4 = 64 possible codons. Since there are only 20 amino acids commonly used in proteins, the code is said to be degenerate (redundant) — most amino acids are coded for by more than one codon.

Key Properties of the Genetic Code

Property	Meaning
Triplet	Three bases code for one amino acid
Degenerate	Most amino acids have more than one codon
Non-overlapping	Each base is read only once as part of one triplet
Universal	The same codons code for the same amino acids in almost all organisms
Comma-free	Codons are read sequentially with no gaps or punctuation between them

Exam Tip: The universality of the genetic code is strong evidence for a common evolutionary origin of all life. However, there are rare exceptions — for example, mitochondria and some protists use slightly different codes.

Exons and Introns

In eukaryotic genes, the coding sequence is not continuous. Genes contain:

Exons — sections of DNA that are expressed (code for parts of the polypeptide)
Introns — non-coding intervening sequences that are transcribed into pre-mRNA but are removed before translation

The proportion of intronic DNA varies enormously between genes. Some genes have no introns at all, while others (such as the dystrophin gene) have dozens of large introns that make up over 99% of the gene's length.

Why Do Introns Exist?

Introns allow alternative splicing — different combinations of exons can be joined together to produce different mRNA molecules from the same gene. This means a single gene can code for multiple different polypeptides, greatly increasing the coding potential of the genome.

For example, the Drosophila DSCAM gene can produce over 38,000 different mRNA variants through alternative splicing.

The Template Strand and Coding Strand

DNA is double-stranded, but only one strand is used as a template during transcription:

Strand	Also known as	Role
Template strand	Antisense strand, non-coding strand	Read 3' → 5' by RNA polymerase
Coding strand	Sense strand, non-template strand	Has the same base sequence as the mRNA (with T instead of U)

The mRNA produced during transcription has the same base sequence as the coding strand, except that thymine (T) is replaced by uracil (U).

Exam Tip: When asked to write the mRNA sequence from a DNA strand, first check which strand you have been given. If it is the template strand, write the complementary sequence replacing T with U. If it is the coding strand, simply replace T with U.

Start and Stop Codons

Translation of mRNA into a polypeptide begins at a start codon and ends at a stop codon:

Codon type	Codon(s)	Function
Start codon	AUG	Signals the start of translation; codes for methionine
Stop codons	UAA, UAG, UGA	Signal the end of translation; do not code for any amino acid

The sequence of codons from the start codon to the stop codon is called the open reading frame (ORF). The reading frame is established by the position of the start codon — if the frame shifts by even one base, completely different amino acids will be specified.

Non-Coding DNA

A large proportion of eukaryotic DNA does not code for polypeptides. This includes:

Introns within genes
Regulatory sequences — promoters, enhancers and silencers that control gene expression
Repetitive DNA — short tandem repeats (STRs) used in genetic fingerprinting
Telomeric DNA — repetitive sequences (TTAGGG in humans) that protect chromosome ends
Pseudogenes — non-functional copies of genes that have accumulated mutations

Historically, non-coding DNA was dismissed as "junk DNA", but research has revealed that much of it has important regulatory and structural functions.

The Human Genome Project

The Human Genome Project (HGP) was an international collaborative project completed in 2003 that determined the complete nucleotide sequence of the human genome. Key findings include:

The human genome contains approximately 3.2 billion base pairs
There are approximately 20,000–25,000 protein-coding genes
Protein-coding sequences make up only about 1.5% of the genome
Over 50% of the genome consists of repetitive sequences
All humans share approximately 99.9% of their DNA sequence

The HGP has had profound implications for medicine, forensics and our understanding of evolution.

Summary

Concept	Key Detail
Gene	Sequence of nucleotides coding for a polypeptide or functional RNA
Locus	Specific position of a gene on a chromosome
Triplet code	Three bases = one amino acid
Degenerate	Multiple codons for most amino acids
Universal	Same code in (almost) all organisms
Exons	Coding sequences
Introns	Non-coding sequences removed during splicing
Start codon	AUG (methionine)
Stop codons	UAA, UAG, UGA

Exam Tip: Questions on the genetic code frequently ask you to explain why it is described as degenerate, non-overlapping or universal. Always give a clear definition and then an example or explanation of the biological significance.

A-Level Deep Dive: Gene Structure and the Genetic Code

Spec mapping

This material sits in Edexcel 9BI0 Topic 8 (Grey Matter — Coordination, Response and Gene Technology), which expects candidates to define a gene as a length of DNA (or RNA in some viruses) coding for a polypeptide or a functional RNA, to describe eukaryotic gene architecture (promoter, exons, introns, terminator), and to state the four formal properties of the genetic code: triplet, universal, degenerate, non-overlapping (with the unspoken fifth property of being commaless). Synoptic links run backwards to Topic 1 (Lifestyle, Health and Risk — biological molecules) for the antiparallel double helix, the sugar–phosphate backbone, complementary base pairing (A–T two H-bonds, G–C three) and the 5'/3' directionality that determines how genes are read; to Topic 2 (Genes and Health — cell biology) for the location of nuclear DNA on chromosomes inside the nuclear envelope and the separate, circular mitochondrial DNA; to Topic 4 (Biodiversity and Natural Resources) for molecular phylogeny, where DNA sequence comparisons rest on the conserved genetic code; to Topic 6 (Infection, Immunity and Forensics) for recombinant DNA techniques that exploit the universality of the code to express human genes in bacteria; and forwards within Topic 8 to the next lessons on transcription (which uses the promoter and produces pre-mRNA from the template strand) and translation (which decodes codons via tRNA at the ribosome). Refer to the official Pearson Edexcel 9BI0 specification document for exact wording.

Worked example with full mark scheme

Question (8 marks):

(a) Describe the structure of a typical eukaryotic gene, identifying the function of each region. (4)

(b) An mRNA molecule has the sequence 5'-AUG GCA UUU UAA-3'. Using the genetic code (AUG = Met, GCA = Ala, UUU = Phe, UAA = stop), state the polypeptide sequence produced and predict the consequence of a single base substitution that changes the third codon to UUC. Explain your reasoning using the formal properties of the code. (4)

Solution with mark scheme:

(a) M1 (AO1) — definition and overall architecture. A gene is a length of DNA on a chromosome that codes for a polypeptide or a functional RNA molecule (such as tRNA or rRNA). In a typical eukaryotic gene, the coding sequence is interrupted by non-coding regions, and the whole locus is flanked by regulatory sequences.

A1 (AO1) — promoter and terminator. The promoter lies upstream (5' side) of the coding sequence and contains the binding site for RNA polymerase and associated transcription factors; it determines where transcription begins and on which strand. The terminator lies downstream (3' side) and signals the end of transcription.

A1 (AO1) — exons and introns. The transcribed region contains alternating exons (sequences retained in mature mRNA and translated into polypeptide) and introns (non-coding intervening sequences that are transcribed into pre-mRNA and then removed by splicing in the nucleus before mature mRNA is exported to the cytoplasm).

A1 (AO1) — UTRs. The 5' and 3' untranslated regions (UTRs) sit at each end of the mRNA between the start codon (AUG) / stop codon and the cap / poly-A tail; they regulate translation efficiency and mRNA stability but are not translated into protein.

(b) M1 (AO2) — translate the original sequence. Reading the mRNA 5' to 3' in codons of three: AUG–GCA–UUU–UAA gives Met–Ala–Phe–stop, so the polypeptide is Met–Ala–Phe (a tripeptide; the stop codon is not translated).

A1 (AO2) — predict the mutation effect. UUU → UUC changes only the third base of the third codon. Both UUU and UUC code for phenylalanine, so the polypeptide is unchanged (still Met–Ala–Phe).

A1 (AO3.1) — name the property of the code responsible. This is a silent (synonymous) mutation, possible only because the genetic code is degenerate: most amino acids have more than one codon, often differing in the third base ("wobble" position).

A1 (AO3.2) — biological significance. Degeneracy is evolutionarily protective: many point substitutions in the third base produce no change in primary structure, so the polypeptide's function is preserved. The non-random clustering of synonymous codons is one reason the code is described as error-tolerant rather than random.

Total: 8 marks (M2 A6).

Specimen question modelled on the Edexcel 9BI0 paper format

Question (6 marks): Researchers compared the nuclear DNA and mitochondrial DNA of human cells. They found that the codon UGA specified stop in nuclear-encoded mRNA but specified the amino acid tryptophan in mitochondrial-encoded mRNA from the same cell. Mitochondrial DNA is circular, contains around 37 genes, and lacks introns; nuclear DNA is linear, contains roughly 20,000 protein-coding genes, and contains many introns.

Discuss what these observations show about the structure of genes and the universality of the genetic code, using the data above.

Mark scheme decomposition by AO:

Mark	AO	Earned by
1	AO1.1	Stating that the standard ("universal") genetic code uses UGA as a stop codon
2	AO1.2	Stating that mitochondrial DNA is circular, lacks introns and resembles bacterial DNA — consistent with an endosymbiotic origin
3	AO2.1	Recognising that "universal" really means "near-universal" — mitochondria are a documented exception
4	AO2.7	Linking the absence of introns in mtDNA to compactness (small genome, high coding density) and to the prokaryotic ancestry
5	AO3.1	Concluding that the mitochondrial code is a variant, supporting the idea that the standard code, while strongly conserved, is not absolutely fixed
6	AO3.2	Justifying that codon reassignment is biologically possible because the code is a convention enforced by tRNA–aminoacyl-tRNA-synthetase pairings, not by chemistry

Total: 6 marks (AO1 = 2, AO2 = 2, AO3 = 2). Edexcel reliably tests the genetic code through "compare nuclear and mitochondrial / compare standard and variant" prompts; candidates who treat universality as absolute lose AO3 marks.

Synoptic links

Topic 1 (Lifestyle, Health and Risk) — DNA structure underpins gene structure. A gene is only intelligible against the antiparallel double helix: complementary base pairing (A–T, G–C) ensures the two strands carry the same information in different directions, and the 5'→3' directionality of the template strand defines the direction of transcription. The sugar–phosphate backbone is identical along the gene; what differs from one gene to the next is the base sequence within exons and the regulatory motifs in the promoter.
Topic 2 (Genes and Health) — location and the nuclear / mitochondrial split. Nuclear genes sit on linear chromosomes inside the nuclear envelope; transcription happens there, splicing removes introns, mature mRNA is exported through nuclear pores, and translation happens at cytoplasmic 80S ribosomes. Mitochondrial DNA is a separate, circular genome of ~37 genes, transcribed and translated inside the mitochondrion at 70S ribosomes — a hallmark of the endosymbiotic origin from a bacterial ancestor.
Topic 4 (Biodiversity and Natural Resources) — molecular phylogeny rests on a conserved code. Comparing DNA or amino-acid sequences between species (cytochrome c, rRNA, mitochondrial COI for "barcoding") only generates meaningful phylogenies because the code is overwhelmingly conserved — the same codons mean the same amino acids in (almost) all life. The few documented exceptions (mitochondria, ciliates, mycoplasmas) are noted but do not undermine the comparative method.
Topic 6 (Infection, Immunity and Forensics) — recombinant DNA exploits universality. Inserting a human insulin gene into E. coli on a plasmid works because E. coli reads the human codons the same way human cells do. Restriction enzymes cut at specific palindromic sequences, DNA ligase seals the recombinant molecule, and the bacterial transcription/translation machinery — using the same code — produces functional human insulin. Removal of introns (using cDNA made from mRNA via reverse transcriptase) is essential because E. coli cannot splice.
Topic 8 next lessons — transcription, translation, gene expression. The promoter and terminator regions defined here are the substrates of transcription (next lesson); the codon table defined here is the operational rule of translation (lesson 3); and the way exon/intron architecture allows alternative splicing is the foundation of the gene-expression and gene-regulation lessons later in the topic.

Mark-scheme literacy

AO	Typical share on gene-structure / genetic-code questions	Earned by
AO1 (knowledge)	35–45%	Defining gene, allele, locus, exon, intron, promoter, terminator, codon; stating the four formal properties of the code
AO2 (application)	35–50%	Decoding a given mRNA to a polypeptide; predicting silent / missense / nonsense / frameshift outcomes from named substitutions; mapping a feature (intron, promoter) onto a labelled diagram
AO3 (analysis / evaluation)	10–20%	Interpreting comparative data (e.g. nuclear vs mitochondrial codes); evaluating "universal" as "near-universal"; arguing that degeneracy is adaptive

Examiner-rewarded phrasing: "a gene codes for a polypeptide or a functional RNA, not necessarily a whole protein"; "the code is degenerate, so multiple codons can specify the same amino acid"; "introns are removed by splicing of the pre-mRNA in the nucleus before export"; "the code is near-universal: the same codons specify the same amino acids in almost all organisms, with documented exceptions in mitochondria and a few protists"; "a silent mutation is possible because of third-base degeneracy".

Phrases that lose marks: "a gene codes for a protein" (insufficient — say polypeptide or functional RNA); "introns are junk DNA" (they are not — alternative splicing of exons across introns expands proteome diversity); "the code is universal" stated as absolute (lose AO3 marks; correct phrasing is "near-universal"); "degeneracy means the code makes mistakes" (degeneracy means multiple codons per amino acid — the opposite of error); "DNA codes directly for protein" (it does not — mRNA is the intermediate, and codons strictly speaking are mRNA triplets).

A common pitfall is confusing gene with allele: a gene is the locus and its DNA sequence; an allele is one of the variant versions of that gene at that locus in the population. A second pitfall is confusing introns (non-coding sequences within a gene, removed by splicing) with intergenic regions (non-coding DNA between genes, never transcribed as part of the gene).

Grade-band model answers

3-mark question

Question: State three properties of the genetic code.

Mid-band response (~95 words):

The genetic code is a triplet code, which means that three bases code for one amino acid. It is also degenerate, because more than one codon can code for the same amino acid. The genetic code is universal, which means that the same codons code for the same amino acids in nearly all organisms, although mitochondria use a slightly different code.

Examiner-style commentary: 3/3. All three properties named (triplet, degenerate, universal) with brief but correct explanations. The acknowledgement that "nearly all" — flagging the mitochondrial exception — is the sort of precision that earns A2/A3 credit on extended questions.

Top-band response (~120 words):

The genetic code has four formal properties. (i) It is a triplet code: each three-base codon on mRNA specifies one amino acid, with 4 $^3$ = 64 possible codons for 20 amino acids plus 3 stop codons. (ii) It is degenerate: most amino acids are encoded by multiple synonymous codons, often differing in the third (wobble) base, which makes silent mutations possible and confers error-tolerance. (iii) It is non-overlapping and commaless: each base is read once, in one codon only, with no separators between codons. (iv) It is near-universal: the same codons specify the same amino acids in essentially all organisms, with documented variants in mitochondria (UGA = Trp) and a few protists.

Examiner-style commentary: 3/3. Four properties named (the canonical three plus non-overlapping/commaless), each with a one-line mechanism, plus the mitochondrial caveat for AO3 polish.

6-mark question

Question: Describe the structure of a typical eukaryotic protein-coding gene and explain how its architecture supports the production of a functional polypeptide.

Mid-band response (~210 words):

A gene is a sequence of DNA bases that codes for a polypeptide. In a eukaryotic gene there is a promoter at the start, then exons and introns, and a terminator at the end. The promoter is where RNA polymerase binds to start transcription. The exons code for parts of the polypeptide, and the introns are non-coding sequences. When the gene is transcribed, the introns are removed by splicing so that only the exons end up in the mRNA. The terminator marks the end of transcription so the polymerase falls off.

The mRNA leaves the nucleus and goes to a ribosome where it is translated into a polypeptide. The codons on the mRNA are read three bases at a time and each codon codes for an amino acid. The start codon AUG is used to begin translation and a stop codon UAA, UAG or UGA is used to end translation.

Examiner-style commentary: 3/6. The architecture is correctly identified at C-grade level, but the answer slides into transcription/translation rather than explaining how the architecture supports a functional polypeptide. Splicing is named but not linked to alternative splicing; UTRs are missing; the link from gene to polypeptide is described rather than analysed.

Stronger response (~250 words):

A typical eukaryotic protein-coding gene has, from 5' to 3' on the coding strand: a promoter region (the binding site for RNA polymerase II and transcription factors that determines where transcription begins); a 5' UTR that is transcribed but not translated; a series of alternating exons (coding sequences retained in mature mRNA) and introns (non-coding intervening sequences that are transcribed but spliced out before export); a 3' UTR; and a terminator region that signals the end of transcription.

This architecture supports the production of a functional polypeptide in three ways. First, the promoter ensures transcription is initiated at the correct nucleotide and on the correct (template) strand, so the resulting mRNA carries the right reading frame. Second, splicing of the pre-mRNA removes the introns and joins the exons in the correct order to give a continuous coding sequence; the start codon AUG within the first exon sets the reading frame, and a stop codon (UAA, UAG or UGA) terminates translation. Third, alternative splicing of exons allows a single gene to produce multiple polypeptide variants, expanding the proteome.

The mature mRNA is exported to the cytoplasm and translated at a ribosome using the genetic code's triplet, non-overlapping, near-universal rules. The 5' and 3' UTRs regulate mRNA stability and translational efficiency without contributing to the polypeptide sequence.

Examiner-style commentary: 5/6. Architecture, splicing and the alternative-splicing payoff are all there. Loses one mark for not stating that the genetic code's degeneracy also contributes to robust polypeptide production (silent mutations preserve sequence) and for not naming any specific transcription factor or polymerase mechanism.

Top-band response (~270 words):

A eukaryotic protein-coding gene has a layered architecture that maps onto the multi-stage path from DNA to functional polypeptide. The promoter (e.g. the TATA box ~25 bp upstream) recruits general transcription factors and RNA polymerase II, fixing the transcription start site and the correct reading-frame register. Downstream lie alternating exons and introns: introns are transcribed into pre-mRNA but are excised by the spliceosome at conserved 5' GU and 3' AG splice sites; exons are joined in defined order to give a continuous coding sequence flanked by 5' and 3' UTRs. A terminator signals end-of-transcription; the mature mRNA receives a 5' cap (m7G) and a 3' poly-A tail before nuclear export.

This architecture supports faithful polypeptide production through four mechanisms. (i) The promoter sets the reading frame and tissue-specific expression. (ii) Splicing removes intronic noise and, via alternative splicing, lets one gene produce multiple polypeptide isoforms — a single human gene averages roughly four isoforms, expanding ~20,000 genes into a far larger proteome. (iii) The genetic code's properties — triplet, non-overlapping, commaless, near-universal — make codon-to-amino-acid mapping unambiguous at the ribosome. (iv) Degeneracy of the code (synonymous codons clustered around third-base differences) makes most third-position substitutions silent, conferring evolutionary error-tolerance on the polypeptide product.

The result is a regulated, modular, error-tolerant pipeline: a fixed genomic locus produces, via splicing variants and post-translational modifications, a tunable repertoire of functional polypeptides. The architecture is therefore not just descriptive but mechanistic — every region (promoter, exon, intron, UTR, terminator) earns its place in the journey from gene to protein.

Examiner-style commentary: 6/6. Names every region with mechanism, distinguishes splice sites at the molecular level, links architecture to alternative splicing and proteome expansion, and finishes with degeneracy as an error-tolerance argument. This is A* synthesis.

9-mark question

Question: Discuss how the structural and informational properties of a gene allow a single locus to specify a functional polypeptide reliably yet flexibly across cell types and environments.

Top-band response (~290 words):

A gene's reliability and flexibility both flow from its layered architecture and the rules of the genetic code.

Reliability comes from four features. The promoter fixes the reading frame and the start nucleotide, so transcription always begins in register; mis-initiation is rare. Splicing at conserved GU/AG splice sites removes introns precisely, giving a defined exon order in mature mRNA. The genetic code is triplet, non-overlapping and commaless, so each codon specifies exactly one amino acid with no ambiguity. And degeneracy of the code — synonymous codons clustered around third-base "wobble" — means many point substitutions are silent, preserving the polypeptide sequence in the face of routine mutation. Reliability is therefore architectural (correct register, correct splice junctions) and informational (one-codon-one-amino-acid plus degeneracy).

Flexibility comes from three features. Alternative splicing allows different combinations of exons to be joined, generating multiple polypeptide isoforms from one locus. Tissue-specific transcription factors binding to the promoter (and to enhancers and silencers further away) regulate when and where the gene is expressed — the same gene can be silent in one cell type and abundantly transcribed in another. 5' and 3' UTRs carry binding sites for regulatory RNAs (miRNAs targeting 3' UTRs) and for translation-initiation factors, tuning protein output without changing the polypeptide.

Synthesis. The gene therefore behaves as a regulated module: its core information (codon sequence in exons) is reliably transmitted to polypeptide via the universal code, while its regulatory regions (promoter, UTRs, splice sites) provide the flexibility to produce different amounts of different isoforms in different contexts. Reliability and flexibility are not in tension — they are produced by different parts of the same architecture.

Examiner-style commentary: 9/9. The candidate separates reliability (architecture + code) from flexibility (regulation + alternative splicing), maps each to specific gene features, and closes with a synthesis that resolves the apparent paradox. This is the level of structured argument that distinguishes a top-band A*.

A-Level-depth misconceptions

Confusing gene with allele. A gene is a length of DNA at a specific locus that codes for a polypeptide or functional RNA; an allele is one of the variant DNA sequences at that locus found in the population. Two organisms can both have "the haemoglobin beta gene" but carry different alleles (e.g. HbA vs HbS). A* candidates use the two terms precisely.
Confusing introns with intergenic regions. Introns are non-coding sequences within a single gene, transcribed into pre-mRNA and then spliced out. Intergenic regions are non-coding DNA between genes, never transcribed as part of any gene. Both are non-coding, but only introns are part of the gene.
Treating the genetic code as random. The code shows clear non-random clustering: physically similar amino acids tend to share codons differing by one base, so many point mutations are silent or conservative. The code is error-tolerant, not random. This is one of biology's strongest pieces of evidence that the code itself was selected for robustness early in evolution.
Missing degeneracy as protective. Candidates often state that the code is degenerate without explaining the consequence. The consequence is that silent mutations (synonymous substitutions) leave the polypeptide unchanged — most third-base substitutions, and some first-base substitutions, are silent. Degeneracy buffers the polypeptide against routine DNA damage.
Treating "universal" as absolute. The code is near-universal, not universal. Mitochondrial DNA uses a variant code: for example, UGA codes for tryptophan in vertebrate mitochondria (whereas it is a stop codon in nuclear-encoded mRNA), and AUA codes for methionine rather than isoleucine. Some ciliated protists and mycoplasmas have other reassignments. State "near-universal, with documented exceptions" for AO3 marks.
Confusing "gene codes for a protein" with "gene codes for a polypeptide". Many proteins are multi-subunit (haemoglobin = 2α + 2β globin chains coded by separate genes; many enzymes are oligomeric). The gene codes for a polypeptide; the protein is assembled from one or more polypeptides plus any cofactors and post-translational modifications. Use "polypeptide" in your gene-definition sentence.
Ignoring functional RNA genes. Not every gene codes for a polypeptide. rRNA, tRNA, snRNA, miRNA, lncRNA genes are transcribed but never translated; their products are themselves the functional molecules. The full definition of a gene is "a DNA sequence coding for a polypeptide or a functional RNA".

Common errors and mark-loss patterns

Vague definitions of a gene. "A bit of DNA" or "a section of DNA" loses AO1 marks. The full mark-scheme phrasing is "a length of DNA that codes for a polypeptide or a functional RNA". Cure: memorise this sentence verbatim and use it at the start of every gene-definition answer.
Skipping the four properties of the code. Answers that name only "triplet" and "universal" miss the marks for degenerate and non-overlapping. Cure: build a four-property checklist (triplet → universal → degenerate → non-overlapping) and tick them off in any code-properties question.
Confusing template strand with coding strand. The mRNA has the same sequence as the coding (sense) strand with U replacing T; the template (antisense) strand is the one read by RNA polymerase. Candidates frequently invert this. Cure: remember that mRNA is complementary to the template strand and identical (apart from U/T) to the coding strand.
Forgetting that introns are transcribed. Introns are present in the pre-mRNA and are removed by splicing; they are not skipped during transcription. Cure: state explicitly "transcribed into pre-mRNA, then spliced out before export".
Missing the mitochondrial exception. "The genetic code is universal" without qualification loses AO3 marks. Cure: always write "near-universal, with documented exceptions in mitochondria and some protists".

Going further — university and academic signposting

Molecular biology (years 1–2): the molecular details of the spliceosome — five small nuclear ribonucleoproteins (snRNPs: U1, U2, U4, U5, U6) recognising 5' GU and 3' AG splice sites and the branch-point adenosine — are central to eukaryotic gene expression. Self-splicing introns (Group I and II) in some organelles and ancient protists hint at a pre-protein RNA-only splicing world.
Bioinformatics: modern gene annotation pipelines (Ensembl, GENCODE) identify genes computationally from genome sequence using exon/intron splice-site signatures, conservation across species, and RNA-seq evidence for expression. Roughly 20,000 protein-coding genes are catalogued in the human genome, with around four splice variants per gene on average.
Origin and evolution of the code: the leading hypothesis is that the genetic code is the product of selection for error-minimisation — codons assigning physicochemically similar amino acids cluster, so most point mutations are conservative. Alternative theories invoke stereochemical fits between codons and amino acids ("RNA world" remnants).
Mitochondrial genetics: human mitochondrial DNA is 16,569 bp, circular, contains 37 genes (13 polypeptides, 22 tRNAs, 2 rRNAs), has no introns, and uses a variant code. Mitochondrial inheritance is strictly maternal, which underpins the use of mtDNA in human-population phylogenies and forensic identification.

Oxbridge-style interview prompt: "If the genetic code is degenerate, why are there exactly 64 codons rather than fewer? Could life function with a doublet code? With a quadruplet code? What constraints fix the triplet?"

Required practical reference

Edexcel 9BI0 has no Core Practical that directly probes gene structure or the genetic code. The closest indirect anchor is Core Practical 8 (preparation of stained microscope sections, e.g. chromosome squashes from root tips of garlic or onion using the Feulgen reaction), which visualises DNA at the chromosome level — Feulgen stain reacts specifically with the deoxyribose of DNA after acid hydrolysis, giving a deep magenta colour proportional to DNA content. This reveals chromosomes condensed during mitosis and locates the bulk of the cell's genetic material to the nucleus.

Examiners reward candidates who connect the macroscopic (chromosome at light-microscope resolution) to the molecular (gene as a length of DNA on that chromosome): one chromosome carries hundreds to thousands of genes, each occupying a distinct locus. Karyotype analysis (counting and ordering chromosomes by size and centromere position) is the cytogenetic level above gene structure. Connect to Topic 6 forensics: DNA profiling by gel electrophoresis of PCR-amplified short-tandem-repeat (STR) regions exploits the fact that intergenic regions vary between individuals while gene sequences within exons are highly conserved.

Edexcel A-Level alignment footer

This content is aligned with the Pearson Edexcel GCE A Level Biology B (9BI0) specification, Paper 1 — Lifestyle, Transport, Genes and Health (with strong Paper 2 — Energy, Exercise and Coordination overlap), Topic 8: Grey Matter — Coordination, Response and Gene Technology. For the most accurate and up-to-date information, please refer to the official Pearson Edexcel specification document.

Visual summary

graph LR
    A["5' UTR<br/>(transcribed,<br/>not translated)"] --> B["Promoter<br/>(RNA pol II<br/>binding site)"]
    B --> C["Exon 1<br/>(coding,<br/>retained in mRNA)"]
    C --> D["Intron 1<br/>(non-coding,<br/>spliced out)"]
    D --> E["Exon 2<br/>(coding)"]
    E --> F["Intron 2<br/>(spliced out)"]
    F --> G["Exon 3<br/>(coding)"]
    G --> H["Terminator<br/>(end of<br/>transcription)"]
    H --> I["3' UTR<br/>(transcribed,<br/>not translated)"]
    C -.->|"Codons read 5' to 3'"| J["Genetic code:<br/>triplet,<br/>non-overlapping,<br/>commaless,<br/>near-universal,<br/>degenerate"]
    E -.-> J
    G -.-> J

    style B fill:#3498db,color:#fff
    style C fill:#27ae60,color:#fff
    style E fill:#27ae60,color:#fff
    style G fill:#27ae60,color:#fff
    style D fill:#e74c3c,color:#fff
    style F fill:#e74c3c,color:#fff
    style J fill:#f39c12,color:#fff

Gene Structure and the Genetic Code

Gene Structure and the Genetic Code

What Is a Gene?

DNA Structure Recap

The Triplet Code

Key Properties of the Genetic Code

Exons and Introns

Why Do Introns Exist?

The Template Strand and Coding Strand

Start and Stop Codons

Non-Coding DNA

The Human Genome Project

Summary

A-Level Deep Dive: Gene Structure and the Genetic Code

Spec mapping

Worked example with full mark scheme

Specimen question modelled on the Edexcel 9BI0 paper format

Synoptic links

Mark-scheme literacy

Grade-band model answers

3-mark question

6-mark question

9-mark question

A-Level-depth misconceptions

Common errors and mark-loss patterns

Going further — university and academic signposting

Required practical reference

Edexcel A-Level alignment footer

Visual summary

More in Biology