Normalisation (1NF, 2NF, 3NF)

Normalisation is the formal, step-by-step process of organising the attributes of a database to reduce redundancy (the same fact stored in more than one place) and eliminate update, insertion and deletion anomalies. It works by applying a sequence of rules called normal forms that progressively decompose poorly-structured tables into a set of well-structured relations linked by keys. The result is a design in which every non-key fact about an entity is stored exactly once. Normalisation is one of the most heavily examined topics in the database unit, and questions almost always ask you to carry a single example all the way through UNF → 1NF → 2NF → 3NF, stating at each stage which dependency you are removing and why. This lesson does exactly that with one carried-through example, then reinforces it with a second.

Spec Mapping

This lesson covers the AQA A-Level Computer Science (7517) Fundamentals of databases requirement to normalise relations to Third Normal Form (3NF). It addresses the concept of functional dependency, the distinction between partial and transitive dependencies, the definitions of First, Second and Third Normal Form, the data anomalies that normalisation removes, and the trade-offs of normalised design. It builds directly on the keys covered in the relational-concepts lesson (partial and transitive dependencies are defined relative to the primary key) and feeds into SQL joins (a normalised schema is the reason joins are needed).

Why Normalise? The Anomalies

An un-normalised table that bundles several entities together suffers from three classic anomalies plus outright redundancy:

Problem	Description
Insertion anomaly	You cannot record one fact without inventing another. E.g. you cannot add a new course to the catalogue until at least one student enrols on it, because course data only exists on enrolment rows.
Deletion anomaly	Deleting one fact unintentionally destroys another. E.g. deleting the last student on a course also erases the only record of that course's title.
Update anomaly	A fact stored in many places must be changed in every place. Miss one and the data becomes inconsistent (two different titles for the same course).
Data redundancy	The same value is physically repeated, wasting storage and creating the opportunity for the update anomaly.

Normalisation removes the cause of all three anomalies: redundancy. Once each fact is stored once, there is nothing to forget to update, nothing to accidentally delete, and no precondition for insertion.

Functional Dependency — the Core Idea

Every normal form is defined in terms of functional dependency. Attribute B is functionally dependent on attribute A (written A → B) if, knowing the value of A, you can determine exactly one value of B. A is the determinant.

StudentID → Surname — given a StudentID, there is exactly one surname. ✓
Surname → StudentID — given a surname, there could be several students. ✗ (not a functional dependency)

Two special cases drive normalisation:

A partial dependency exists when a non-key attribute depends on only part of a composite primary key (removed in 2NF).
A transitive dependency exists when a non-key attribute depends on another non-key attribute rather than directly on the key (removed in 3NF).

Keep this in mind: the memory aid "the key, the whole key, and nothing but the key" maps exactly onto 1NF (a key exists), 2NF (the whole key — no partial dependency) and 3NF (nothing but the key — no transitive dependency).

The Carried-Through Example

We will normalise one un-normalised table all the way to 3NF. A bookshop records each customer's orders like this:

Unnormalised Form (UNF)

OrderID	OrderDate	CustomerID	CustomerName	CustomerCity	CityPostcodePrefix	Products (Code, Title, Qty)
501	2024-03-01	C20	Alice Chen	Leeds	LS	(P1, Widget, 3), (P2, Gadget, 1)
502	2024-03-02	C21	Bob Patel	York	YO	(P1, Widget, 1)
503	2024-03-02	C20	Alice Chen	Leeds	LS	(P3, Sprocket, 2)

This is unnormalised because the Products column holds a repeating group (several products per order packed into one cell). There is also obvious redundancy: Alice Chen / Leeds / LS is repeated on every order she places.

First Normal Form (1NF)

A relation is in 1NF if: all attribute values are atomic (one value per cell), there are no repeating groups, and a primary key exists that uniquely identifies every tuple.

Applying 1NF — flatten the repeating group so each product on each order is its own row:

OrderID	OrderDate	CustomerID	CustomerName	CustomerCity	CityPostcodePrefix	ProductCode	ProductTitle	Qty
501	2024-03-01	C20	Alice Chen	Leeds	LS	P1	Widget	3
501	2024-03-01	C20	Alice Chen	Leeds	LS	P2	Gadget	1
502	2024-03-02	C21	Bob Patel	York	YO	P1	Widget	1
503	2024-03-02	C20	Alice Chen	Leeds	LS	P3	Sprocket	2

Every cell is now atomic. Because one order can contain several products and one product appears on several orders, neither OrderID nor ProductCode alone is unique — the composite primary key is (OrderID, ProductCode).

The relation is now in 1NF but riddled with redundancy: order and customer details repeat for every product line. We address that next.

Second Normal Form (2NF)

A relation is in 2NF if: it is already in 1NF and every non-key attribute is fully functionally dependent on the whole primary key — i.e. there are no partial dependencies.

2NF is only relevant when the primary key is composite. If the primary key is a single attribute, a 1NF relation is automatically in 2NF.

Identify the partial dependencies against the composite key (OrderID, ProductCode):

OrderID → OrderDate, CustomerID, CustomerName, CustomerCity, CityPostcodePrefix — these depend on only part of the key (OrderID), not ProductCode. Partial dependency.
ProductCode → ProductTitle — the title depends on only part of the key (ProductCode), not OrderID. Partial dependency.
(OrderID, ProductCode) → Qty — the quantity genuinely depends on the whole key (the quantity of this product on this order). Full dependency — stays.

Applying 2NF — split out each partial dependency into its own relation:

Order (key: OrderID)

OrderID	OrderDate	CustomerID	CustomerName	CustomerCity	CityPostcodePrefix
501	2024-03-01	C20	Alice Chen	Leeds	LS
502	2024-03-02	C21	Bob Patel	York	YO
503	2024-03-02	C20	Alice Chen	Leeds	LS

Product (key: ProductCode)

ProductCode	ProductTitle
P1	Widget
P2	Gadget
P3	Sprocket

OrderLine (composite key: OrderID + ProductCode)

OrderID	ProductCode	Qty
501	P1	3
501	P2	1
502	P1	1
503	P3	2

Product titles are no longer repeated per order line. But the Order relation still has a problem: customer details repeat whenever a customer places more than one order (Alice's details appear on orders 501 and 503).

Third Normal Form (3NF)

A relation is in 3NF if: it is already in 2NF and there are no transitive dependencies — no non-key attribute depends on another non-key attribute. Equivalently: every non-key attribute depends on the key, the whole key, and nothing but the key.

Identify the transitive dependencies in the Order relation (key: OrderID):

OrderID → CustomerID (direct) and CustomerID → CustomerName, CustomerCity — so OrderID → CustomerName through CustomerID. Transitive dependency (CustomerName depends on the non-key attribute CustomerID).
CustomerCity → CityPostcodePrefix — the postcode prefix depends on the city, another non-key attribute. Transitive dependency.

Applying 3NF — remove each transitive dependency into its own relation:

Customer (key: CustomerID)

CustomerID	CustomerName	CustomerCity
C20	Alice Chen	Leeds
C21	Bob Patel	York

City (key: CustomerCity)

CustomerCity	CityPostcodePrefix
Leeds	LS
York	YO

Order (key: OrderID) — now references Customer by foreign key

OrderID	OrderDate	CustomerID
501	2024-03-01	C20
502	2024-03-02	C21
503	2024-03-02	C20

(Product and OrderLine from the 2NF step are unchanged and already in 3NF.)

The final 3NF schema

Customer(CustomerID, CustomerName, CustomerCity*) — *CustomerCity is a foreign key to City
City(CustomerCity, CityPostcodePrefix)
Order(OrderID, OrderDate, CustomerID*) — *FK to Customer
Product(ProductCode, ProductTitle)
OrderLine(OrderID, ProductCode, Qty) — both attributes are FKs to Order and Product

Now every fact is stored exactly once. Change Alice's city in one place; correct a product title in one place; the anomalies are gone.

Summary of Normal Forms

Normal form	Rule (in addition to the previous form)	What it eliminates
1NF	Atomic values, no repeating groups, a primary key exists	Repeating groups / multi-valued fields
2NF	No partial dependencies (non-key attributes depend on the whole composite key)	Partial dependencies on part of a composite key
3NF	No transitive dependencies (no non-key attribute depends on another non-key attribute)	Transitive (indirect) dependencies

"The key (1NF), the whole key (2NF), and nothing but the key (3NF) — so help me Codd."

Advantages and Disadvantages of Normalisation

Advantages	Disadvantages
Eliminates redundancy → less storage	More tables → schema is harder to grasp at a glance
Removes update, insertion and deletion anomalies	Queries need joins, which add processing cost
Improves data integrity and consistency	Read-heavy reporting systems may run slower
Easier, safer maintenance (change a fact once)	Sometimes deliberately de-normalised for performance

This is why some high-read systems (e.g. data warehouses) deliberately de-normalise — accepting controlled redundancy in exchange for fewer joins — a trade-off you should be ready to discuss.

Determinants and Dependency Diagrams

A clear way to reason about normalisation is to write out the functional dependencies of a relation as a list, with each determinant on the left. For the un-normalised OrderLine of our carried-through example the dependencies were:

(OrderID, ProductCode) → Qty (full dependency on the whole composite key)
OrderID → OrderDate, CustomerID (partial — depends on part of the key)
ProductCode → ProductTitle (partial — depends on the other part of the key)
CustomerID → CustomerName, CustomerCity (transitive — determinant is a non-key attribute)
CustomerCity → CityPostcodePrefix (transitive)

Reading this list tells you immediately what to do: the partial dependencies are removed at 2NF (each determinant that is part of the key becomes its own relation), and the transitive dependencies are removed at 3NF (each non-key determinant becomes its own relation). In effect, 3NF says: every determinant should be a candidate key of its relation. If you can find a determinant that is neither the whole key nor a candidate key, the relation is not yet fully normalised.

This determinant-listing technique is the fastest reliable method in an exam: list the dependencies, classify each as full / partial / transitive, and the decomposition almost writes itself.

Worked Example — Identifying the Normal Form of a Given Relation

Exam questions sometimes give you a relation and ask "what is the highest normal form this satisfies, and why?" Consider:

Booking(BookingRef, RoomNumber, RoomType, GuestName, GuestEmail)

with the dependencies:

BookingRef → RoomNumber, GuestName, GuestEmail
RoomNumber → RoomType
GuestEmail → GuestName

Is it in 1NF? Yes — all values are atomic, no repeating groups, and BookingRef is a single-attribute primary key.

Is it in 2NF? Yes — the primary key is a single attribute, so no partial dependency is possible.

Is it in 3NF? No. There are two transitive dependencies: BookingRef → RoomNumber → RoomType (RoomType depends on the non-key attribute RoomNumber) and BookingRef → GuestEmail → GuestName (GuestName depends on the non-key attribute GuestEmail).

So the relation is in 2NF but not 3NF. To reach 3NF:

Room(RoomNumber, RoomType)
Guest(GuestEmail, GuestName)
Booking(BookingRef, RoomNumber*, GuestEmail*) — *foreign keys

The skill being tested is recognising that a single-attribute key makes 2NF automatic, then spotting non-key→non-key dependencies for 3NF. Always justify your answer with the specific dependency that does (or does not) violate the form.

Normalisation and E-R Modelling — Two Routes to the Same Place

Normalisation and entity-relationship modelling are two complementary routes to a good schema:

Top-down (E-R modelling): start from the real-world entities, draw the diagram, translate to relations. A carefully drawn diagram usually lands in 3NF directly.
Bottom-up (normalisation): start from a messy collection of attributes (perhaps an existing flat file or a paper form) and derive the entities by removing dependencies.

They should agree. If you E-R model a problem and then normalise the resulting relations and find further decomposition is needed, that is a hint your diagram missed an entity. In our carried-through example, the City relation that emerged at 3NF corresponds to a "City" entity that a hasty E-R diagram might have overlooked. Using both techniques as a cross-check is exactly what professional database designers do.

Second Carried-Through Example — A Music Streaming Log

To consolidate the full UNF → 3NF journey on a different scenario, consider a raw log a streaming service keeps of which tracks a user played:

Normalisation (1NF, 2NF, 3NF)

Normalisation (1NF, 2NF, 3NF)

Spec Mapping

Why Normalise? The Anomalies

Functional Dependency — the Core Idea

The Carried-Through Example

Unnormalised Form (UNF)

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

The final 3NF schema

Summary of Normal Forms

Advantages and Disadvantages of Normalisation

Determinants and Dependency Diagrams

Worked Example — Identifying the Normal Form of a Given Relation

Normalisation and E-R Modelling — Two Routes to the Same Place

Second Carried-Through Example — A Music Streaming Log

UNF

More in Computer Science