Amino Acids, Proteins and DNA

α-Amino acids are the twenty molecular monomers from which every protein in every living organism is constructed — a chemical alphabet that, in different sequences, encodes the catalytic machinery of enzymes, the structural fabric of collagen and keratin, the oxygen-carrying capacity of haemoglobin, and the immune recognition of antibodies. In this lesson we develop the chemistry that underpins biology. We first establish the general α-amino acid structure H₂N–CH(R)–COOH and the consequence of carrying both an acidic and a basic group: the zwitterion. We then quantify acid-base behaviour through the isoelectric point pI, examine peptide-bond formation as a condensation reaction whose product is the amide linkage already familiar from lesson 2, and build the hierarchy of protein structure — primary, secondary, tertiary, quaternary — from sequence through hydrogen-bonded helices and sheets to multi-subunit assemblies. We close with the molecular structure of DNA: a pentose-phosphate backbone, four bases, and the Watson-Crick complementary base-pairing scheme (1953) that allows the molecule to copy itself.

Spec mapping (AQA 7405): This lesson maps to §3.3.13 (amino acids, proteins and DNA). It draws explicitly on §3.3.11 (lesson 5 of this course, amines) for the nucleophilic and basic chemistry of the –NH₂ group, on §3.3.9 (lesson 2 of this course, amides and acyl chlorides) for the chemistry of the peptide bond, and on §3.3.14 (lesson 7, condensation polymers) — since proteins are biopolyamides built by exactly the same condensation chemistry as nylon. Secondary and tertiary protein structure rely on hydrogen bonding, the principles of which are developed in §3.1.3 lesson 4 of the AQA AS bonding course. Refer to the official AQA specification document for the exact wording of each section.

Assessment objectives: AO1 recall items include the general α-amino acid structure, the definitions of zwitterion and isoelectric point, the four levels of protein structure, and the Watson-Crick complementary base-pairing scheme A=T and G≡C. AO2 application questions require students to draw the zwitterion form at a stated pH, to identify the peptide bond in a given dipeptide or tripeptide, and to assign the complementary base sequence of a short DNA strand. AO3 reasoning questions ask students to predict protein folding from a tabulated set of R-group properties, to explain why DNA forms a stable double helix in terms of the number and geometry of inter-strand hydrogen bonds, and to rationalise the difference in pI between an acidic, neutral, and basic amino acid using the side-chain pKa values.

α-Amino Acids: the Building Blocks

A 2-amino acid (or α-amino acid) has the general structure H₂N–CH(R)–COOH: a central carbon (the α-carbon) bonded to a primary amine –NH₂, a carboxylic acid –COOH, a hydrogen atom, and a variable side chain R that defines the identity of the amino acid. Twenty α-amino acids are encoded by the standard genetic code; all share the same α-carbon framework and differ only in R.

The α-carbon carries four different substituents in nineteen of the twenty proteinogenic amino acids, making it a stereocentre. The exception is glycine (R = H), which is achiral. The remaining nineteen amino acids exist as enantiomers L and D; proteins are built exclusively from the L-enantiomer. In CIP nomenclature, L-amino acids correspond to (S)-configuration for eighteen of the nineteen; L-cysteine is formally (R) because the sulfur side-chain alters priority order.

Side chain class	Examples (3-letter code)	R-group character
Aliphatic, non-polar	Gly, Ala, Val, Leu, Ile	Hydrophobic
Aromatic	Phe, Tyr, Trp	Hydrophobic (Tyr/Trp can H-bond)
Polar uncharged	Ser, Thr, Asn, Gln	H-bond donor/acceptor
Acidic (negative at pH 7)	Asp, Glu	–COOH side chain
Basic (positive at pH 7)	Lys, Arg, His	–NH₂ or guanidinium side chain
Sulfur-containing	Cys, Met	Cys forms disulfide bridges

AQA do not require recall of all twenty structures, but students should be able to interpret data tables and recognise the side-chain category from a given structure.

The Zwitterion

The amine group is basic (its conjugate acid –NH₃⁺ has pKa ≈ 10-11) and the carboxylic acid group is acidic (pKa ≈ 2-3, more acidic than a typical aliphatic carboxylic acid because the protonated amine cation withdraws electron density inductively). At any pH between roughly 3 and 9 — including all biologically relevant pH values — an internal proton transfer occurs:

H₂N–CH(R)–COOH → ⁺H₃N–CH(R)–COO⁻

The product is a zwitterion: a dipolar ion with both a positive and a negative formal charge but a net charge of zero. The zwitterion is the dominant species in water and in cellular fluids. Amino acids exist as zwitterions in the solid state too, which is why crystalline amino acids have surprisingly high melting points (200-300 °C with decomposition) for their modest molecular mass: the solid is an ionic lattice held together by electrostatic attraction between –NH₃⁺ and –COO⁻ centres. Amino acids are also highly soluble in water and almost insoluble in non-polar solvents — both consequences of the zwitterion.

Acid-base behaviour at different pH

Adding strong acid (lowering pH) protonates the carboxylate; adding strong base (raising pH) deprotonates the ammonium. For a simple neutral amino acid such as alanine:

pH region	Dominant species	Net charge
Very low (pH < 2)	⁺H₃N–CH(R)–COOH	+1
Intermediate (~pH 5-6)	⁺H₃N–CH(R)–COO⁻ (zwitterion)	0
Very high (pH > 10)	H₂N–CH(R)–COO⁻	−1

The two pKa values bracket the zwitterion region. For alanine, pKa1 = 2.34 (–COOH ⇌ –COO⁻) and pKa2 = 9.69 (–NH₃⁺ ⇌ –NH₂).

The Isoelectric Point pI

The isoelectric point pI is the pH at which the net charge of the amino acid is zero — that is, the pH at which the concentration of the cationic form equals the concentration of the anionic form and the zwitterion is at maximum concentration.

For a neutral amino acid (no ionisable side chain — Gly, Ala, Val, Leu, Ile, Phe, etc.) the isoelectric point is simply the arithmetic mean of the two backbone pKa values:

pI = (pKa1 + pKa2) / 2

Worked example: pI of glycine

Glycine has pKa1 = 2.34 (–COOH) and pKa2 = 9.60 (–NH₃⁺).

pI = (2.34 + 9.60) / 2 = 5.97

At pH 5.97, glycine exists almost entirely as the zwitterion. Below pH 5.97 the cation begins to dominate; above pH 5.97 the anion begins to dominate.

Acidic and basic amino acids

For amino acids with an ionisable side chain, three pKa values exist and pI is the mean of the two pKa values flanking the zwitterion form.

Aspartic acid (Asp, side chain –CH₂–COOH): pKa1 = 1.88 (α-COOH), pKa-side = 3.65 (side-chain COOH), pKa2 = 9.60 (α-NH₃⁺). The two acidic pKa values flank the neutral form. pI = (1.88 + 3.65)/2 ≈ 2.77 — strongly acidic.
Glutamic acid (Glu, –CH₂CH₂–COOH): pI ≈ 3.22, similarly low.
Lysine (Lys, side chain –(CH₂)₄–NH₂): pKa1 = 2.18, pKa2 = 8.95 (α-NH₃⁺), pKa-side = 10.53 (side-chain NH₃⁺). The two basic pKa values flank the neutral form. pI = (8.95 + 10.53)/2 ≈ 9.74 — strongly basic.
Arginine (Arg, guanidinium side chain): pI ≈ 10.76.

The pattern: acidic amino acids have low pI (around 3); neutral amino acids have pI around 5-6; basic amino acids have high pI (around 10). This is the foundation of separating proteins by isoelectric focusing (a technique used in proteomics, beyond the A-Level syllabus but a natural extension).

Sketching the titration curve

A titration of a neutral amino acid (such as glycine) starts at low pH with the fully protonated cation and ends at high pH with the fully deprotonated anion. The curve shows two buffering regions — one around pKa1 (where –COOH is half-deprotonated) and one around pKa2 (where –NH₃⁺ is half-deprotonated). The midpoint of the central plateau between the two buffering regions is the isoelectric point: the pH at which net charge is zero.

Peptide-Bond Formation

Two amino acids condense by losing one water molecule between the –COOH of the first and the –NH₂ of the second:

H₂N–CH(R₁)–COOH + H₂N–CH(R₂)–COOH → H₂N–CH(R₁)–CO–NH–CH(R₂)–COOH + H₂O

The new –CO–NH– linkage is the peptide bond. Chemically, it is identical to the amide bond developed in lesson 2 of this course: a carbonyl carbon bonded to a nitrogen, with partial double-bond character distributed across the C–N bond through delocalisation of the nitrogen lone pair into the C=O π-system. The peptide bond is therefore planar (six atoms — Cα, C, O, N, H, Cα — lie in one plane) and rotation about the C–N axis is hindered. This planarity is essential for the regular geometry of secondary structures.

A chain of two amino acids is a dipeptide; three is a tripeptide; up to ~50 is an oligopeptide; longer than ~50 is a polypeptide or protein. Peptides are written from the N-terminus (free –NH₂, left) to the C-terminus (free –COOH, right). Order matters: Ala–Gly and Gly–Ala are different molecules.

Practical note: Direct condensation of free amino acids in solution gives a mixture of all possible dipeptides plus higher oligomers and is useless synthetically. Real peptide synthesis uses protecting groups (Merrifield's solid-phase method, beyond A-Level). In cells, the ribosome assembles peptides with absolute sequence fidelity using messenger RNA as the template.

Hydrolysis of the Peptide Bond

Peptide bonds are kinetically stable in neutral solution at room temperature (the half-life is centuries), but hydrolyse under harsh acid or base. The standard procedure to break a protein down to its constituent amino acids is:

6 mol dm⁻³ HCl, 110 °C, 24 hours, sealed tube under nitrogen.

This is the inverse of peptide-bond formation: water is added across the –CO–NH– bond, regenerating the –COOH of one residue and the –NH₂ of the next. The end products are a mixture of free amino acids — though tryptophan is destroyed under these conditions and asparagine/glutamine are converted to their parent acids (Asp/Glu). The amino acid mixture can then be identified by paper chromatography or thin-layer chromatography (developed with ninhydrin), or by electrophoresis at a chosen pH.

Primary Structure

The primary structure of a protein is the sequence of amino acid residues along the polypeptide chain, read from the N-terminus to the C-terminus. The sequence is genetically encoded: a triplet of DNA bases (a codon) specifies each amino acid, and the ribosome translates the messenger RNA codon-by-codon to assemble the chain. Primary structure determines everything else — secondary, tertiary, and quaternary structure all fold spontaneously from the primary sequence under physiological conditions. The classic demonstration was Christian Anfinsen's experiment on ribonuclease (Nobel 1972), where denatured enzyme refolded to its native shape simply on removal of the denaturant.

Secondary Structure

Secondary structure describes the local folding of the polypeptide backbone into regular, repeating geometries stabilised by hydrogen bonds between backbone –C=O and –N–H groups (not between R-groups — that's tertiary). Two motifs dominate:

α-helix. The chain coils into a right-handed helix with 3.6 residues per turn. Each backbone C=O hydrogen-bonds to the N–H of the residue four positions further along the chain. The hydrogen bonds run parallel to the helix axis, and the R-groups project outward like the rungs of a corkscrew. Examples include the α-keratin of hair and nails.
β-pleated sheet. Adjacent extended strands of the polypeptide chain align side-by-side, with hydrogen bonds running perpendicular to the chain direction between the C=O of one strand and the N–H of the next. The strands can be parallel (running the same N→C direction) or antiparallel (alternating directions). Antiparallel sheets are slightly more stable because the hydrogen bonds are colinear with the donor–acceptor axis. Silk fibroin is largely β-sheet.

Both motifs are stabilised by many weak hydrogen bonds acting cooperatively: any one hydrogen bond is worth only ~20 kJ mol⁻¹, but a typical α-helix of 20 residues has 16 such bonds, and the cumulative effect is substantial. The principle — that secondary structure arises from regular, repetitive backbone hydrogen bonding — links directly to the broader treatment of hydrogen bonding in §3.1.3 lesson 4 of the AQA bonding course.

Tertiary Structure

Tertiary structure is the overall three-dimensional folded shape of a single polypeptide chain. Unlike secondary structure (which involves backbone-to-backbone hydrogen bonds), tertiary structure is determined by interactions between R-groups that may be far apart in the primary sequence. Four kinds of R-group interaction stabilise the tertiary fold:

Disulfide bridges (S–S). Two cysteine residues (R = –CH₂–SH) can undergo oxidation to form a covalent –CH₂–S–S–CH₂– bridge. This is a true covalent bond (~250 kJ mol⁻¹) and is the strongest tertiary-structure interaction. Insulin has three disulfide bridges holding its two chains together. Hair perming relies on reducing then re-oxidising the disulfide bridges in keratin.
Ionic interactions (salt bridges). A negatively charged side chain (Asp, Glu, –COO⁻) electrostatically attracts a positively charged side chain (Lys –NH₃⁺, Arg guanidinium, protonated His). At pH 7 most acidic side chains are deprotonated and most basic side chains are protonated, so salt bridges are widespread.
Hydrogen bonds (R-group to R-group). Polar side chains such as Ser, Thr, Asn, Gln, Tyr can act as hydrogen-bond donors or acceptors with each other or with the backbone.
Hydrophobic interactions. Non-polar side chains (Val, Leu, Ile, Phe, Met, Trp) cluster together in the interior of the folded protein, excluding water. Strictly the driving force is the entropy gain of water when ordered water molecules around hydrophobic surfaces are released to the bulk solvent — but at A-Level it suffices to say that hydrophobic side chains cluster on the inside and hydrophilic side chains face the surface.

Tertiary structure is destroyed by denaturation: heat (breaks H-bonds and disrupts hydrophobic clustering), strong acid or base (protonates/deprotonates side chains and breaks salt bridges), heavy metals (precipitate –SH groups), reducing agents (cleave disulfide bridges), and detergents (disrupt hydrophobic packing). Once denatured, most proteins lose their biological function.

Quaternary Structure

Quaternary structure is the assembly of two or more polypeptide chains (subunits) into a functional multi-subunit protein. The subunits are held together by the same kinds of R-group interactions that stabilise tertiary structure — hydrogen bonds, ionic interactions, hydrophobic contacts, and occasionally disulfide bridges between chains. The classic example is haemoglobin: a tetramer of two α-chains and two β-chains, each carrying a haem prosthetic group and binding one O₂. Cooperative binding between the four subunits gives haemoglobin its characteristic sigmoidal oxygen-binding curve, central to oxygen delivery from lungs to tissues. Insulin, antibodies (IgG: two heavy and two light chains held together by disulfide bridges), and the photosynthetic reaction centre are all quaternary-structure assemblies.

Not all proteins have quaternary structure — many enzymes are monomeric (a single chain) and require no subunit assembly. Quaternary structure is therefore an optional level, present only in oligomeric proteins.

Electrophoresis

Electrophoresis separates charged molecules by their migration in an electric field. A small spot of an amino acid mixture is applied to a buffered gel (typically agarose or polyacrylamide) saturated with buffer at a chosen pH. An electric field is applied across the gel:

Amino acids whose pI is below the buffer pH carry net negative charge and migrate toward the anode (positive electrode).
Amino acids whose pI is above the buffer pH carry net positive charge and migrate toward the cathode (negative electrode).
Amino acids whose pI equals the buffer pH carry zero net charge and remain at the origin.

If a buffer at pH 6 is used on a mixture of Asp (pI 2.77), Gly (pI 5.97) and Lys (pI 9.74), then at pH 6:

Asp (pI < pH) is anionic → moves toward the anode.
Gly (pI ≈ pH) is near-neutral → stays at the origin.
Lys (pI > pH) is cationic → moves toward the cathode.

Spots are visualised by spraying with ninhydrin and heating: primary amines react with ninhydrin to give a purple-violet condensation product (Ruhemann's purple). Proline, with its secondary amine, gives a yellow product instead.

Practical-skills box — paper chromatography of amino acids. Spot the amino acid mixture and reference samples onto chromatography paper near the bottom edge. Develop in a polar solvent (butan-1-ol / ethanoic acid / water, typically 4:1:5 by volume) until the solvent front nears the top. Dry, spray with 0.2% ninhydrin in propan-2-ol, and warm in an oven at 100 °C for 5 minutes. Purple-violet spots appear at characteristic Rf values that can be compared with reference standards. The technique is mentioned by AQA as an example of a chromatographic separation; quantitative analysis of Rf values is more often examined in the spectroscopy lesson.

DNA Structure

Deoxyribonucleic acid (DNA) is the polymeric biomolecule that stores the genetic information of every cell. Its structure was deduced in 1953 by James Watson and Francis Crick at Cambridge, building on the X-ray fibre-diffraction patterns of Rosalind Franklin and Maurice Wilkins at King's College London. AQA require only that the names be recognised — not biographical detail — but the model is one of the great triumphs of structural chemistry.

The monomer: nucleotide

A nucleotide has three parts:

a deoxyribose sugar (a 5-carbon pentose, the 2'-deoxy form of ribose),
a phosphate group (–OPO₃²⁻) at the 5' carbon,
a nitrogen-containing base at the 1' carbon, joined by an N-glycosidic bond.

The four bases are: adenine (A) and guanine (G) — two-ring purines — and thymine (T) and cytosine (C) — single-ring pyrimidines. (RNA, beyond this lesson, uses uracil in place of thymine.)

The polymer: phosphodiester backbone

Nucleotides polymerise by phosphodiester bonds between the 3'-OH of one sugar and the 5'-phosphate of the next. The polymer therefore has a directional sugar-phosphate backbone with a 5'-end (free phosphate) and a 3'-end (free hydroxyl). The bases project sideways from the backbone.

The double helix and base pairing

Two antiparallel strands wind around a common axis to form a right-handed double helix with one full turn every ~3.4 nm and ten base pairs per turn. The backbones run on the outside; the bases stack on the inside. Antiparallel means one strand runs 5' → 3' while the partner runs 3' → 5'.

The two strands are held together by complementary base pairing:

Base pair	H-bonds	Geometry
A = T	2 hydrogen bonds	A donates one N–H to T's C=O; T donates one N–H to A's N
G ≡ C	3 hydrogen bonds	G donates two N–H bonds (to C's C=O and C's N) and accepts one from C's N–H

Amino Acids, Proteins and DNA

Amino Acids, Proteins and DNA

α-Amino Acids: the Building Blocks

The Zwitterion

Acid-base behaviour at different pH

The Isoelectric Point pI

Worked example: pI of glycine

Acidic and basic amino acids

Sketching the titration curve

Peptide-Bond Formation

Hydrolysis of the Peptide Bond

Primary Structure

Secondary Structure

Tertiary Structure

Quaternary Structure

Electrophoresis

DNA Structure

The monomer: nucleotide

The polymer: phosphodiester backbone

The double helix and base pairing

More in Chemistry