You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Normalisation is the formal, step-by-step process of organising the attributes of a database to reduce redundancy (the same fact stored in more than one place) and eliminate update, insertion and deletion anomalies. It works by applying a sequence of rules called normal forms that progressively decompose poorly-structured tables into a set of well-structured relations linked by keys. The result is a design in which every non-key fact about an entity is stored exactly once. Normalisation is one of the most heavily examined topics in the database unit, and questions almost always ask you to carry a single example all the way through UNF → 1NF → 2NF → 3NF, stating at each stage which dependency you are removing and why. This lesson does exactly that with one carried-through example, then reinforces it with a second.
This lesson covers the AQA A-Level Computer Science (7517) Fundamentals of databases requirement to normalise relations to Third Normal Form (3NF). It addresses the concept of functional dependency, the distinction between partial and transitive dependencies, the definitions of First, Second and Third Normal Form, the data anomalies that normalisation removes, and the trade-offs of normalised design. It builds directly on the keys covered in the relational-concepts lesson (partial and transitive dependencies are defined relative to the primary key) and feeds into SQL joins (a normalised schema is the reason joins are needed).
An un-normalised table that bundles several entities together suffers from three classic anomalies plus outright redundancy:
| Problem | Description |
|---|---|
| Insertion anomaly | You cannot record one fact without inventing another. E.g. you cannot add a new course to the catalogue until at least one student enrols on it, because course data only exists on enrolment rows. |
| Deletion anomaly | Deleting one fact unintentionally destroys another. E.g. deleting the last student on a course also erases the only record of that course's title. |
| Update anomaly | A fact stored in many places must be changed in every place. Miss one and the data becomes inconsistent (two different titles for the same course). |
| Data redundancy | The same value is physically repeated, wasting storage and creating the opportunity for the update anomaly. |
Normalisation removes the cause of all three anomalies: redundancy. Once each fact is stored once, there is nothing to forget to update, nothing to accidentally delete, and no precondition for insertion.
Every normal form is defined in terms of functional dependency. Attribute B is functionally dependent on attribute A (written A → B) if, knowing the value of A, you can determine exactly one value of B. A is the determinant.
StudentID → Surname — given a StudentID, there is exactly one surname. ✓Surname → StudentID — given a surname, there could be several students. ✗ (not a functional dependency)Two special cases drive normalisation:
Keep this in mind: the memory aid "the key, the whole key, and nothing but the key" maps exactly onto 1NF (a key exists), 2NF (the whole key — no partial dependency) and 3NF (nothing but the key — no transitive dependency).
We will normalise one un-normalised table all the way to 3NF. A bookshop records each customer's orders like this:
| OrderID | OrderDate | CustomerID | CustomerName | CustomerCity | CityPostcodePrefix | Products (Code, Title, Qty) |
|---|---|---|---|---|---|---|
| 501 | 2024-03-01 | C20 | Alice Chen | Leeds | LS | (P1, Widget, 3), (P2, Gadget, 1) |
| 502 | 2024-03-02 | C21 | Bob Patel | York | YO | (P1, Widget, 1) |
| 503 | 2024-03-02 | C20 | Alice Chen | Leeds | LS | (P3, Sprocket, 2) |
This is unnormalised because the Products column holds a repeating group (several products per order packed into one cell). There is also obvious redundancy: Alice Chen / Leeds / LS is repeated on every order she places.
A relation is in 1NF if: all attribute values are atomic (one value per cell), there are no repeating groups, and a primary key exists that uniquely identifies every tuple.
Applying 1NF — flatten the repeating group so each product on each order is its own row:
| OrderID | OrderDate | CustomerID | CustomerName | CustomerCity | CityPostcodePrefix | ProductCode | ProductTitle | Qty |
|---|---|---|---|---|---|---|---|---|
| 501 | 2024-03-01 | C20 | Alice Chen | Leeds | LS | P1 | Widget | 3 |
| 501 | 2024-03-01 | C20 | Alice Chen | Leeds | LS | P2 | Gadget | 1 |
| 502 | 2024-03-02 | C21 | Bob Patel | York | YO | P1 | Widget | 1 |
| 503 | 2024-03-02 | C20 | Alice Chen | Leeds | LS | P3 | Sprocket | 2 |
Every cell is now atomic. Because one order can contain several products and one product appears on several orders, neither OrderID nor ProductCode alone is unique — the composite primary key is (OrderID, ProductCode).
The relation is now in 1NF but riddled with redundancy: order and customer details repeat for every product line. We address that next.
A relation is in 2NF if: it is already in 1NF and every non-key attribute is fully functionally dependent on the whole primary key — i.e. there are no partial dependencies.
2NF is only relevant when the primary key is composite. If the primary key is a single attribute, a 1NF relation is automatically in 2NF.
Identify the partial dependencies against the composite key (OrderID, ProductCode):
OrderID → OrderDate, CustomerID, CustomerName, CustomerCity, CityPostcodePrefix — these depend on only part of the key (OrderID), not ProductCode. Partial dependency.ProductCode → ProductTitle — the title depends on only part of the key (ProductCode), not OrderID. Partial dependency.(OrderID, ProductCode) → Qty — the quantity genuinely depends on the whole key (the quantity of this product on this order). Full dependency — stays.Applying 2NF — split out each partial dependency into its own relation:
Order (key: OrderID)
| OrderID | OrderDate | CustomerID | CustomerName | CustomerCity | CityPostcodePrefix |
|---|---|---|---|---|---|
| 501 | 2024-03-01 | C20 | Alice Chen | Leeds | LS |
| 502 | 2024-03-02 | C21 | Bob Patel | York | YO |
| 503 | 2024-03-02 | C20 | Alice Chen | Leeds | LS |
Product (key: ProductCode)
| ProductCode | ProductTitle |
|---|---|
| P1 | Widget |
| P2 | Gadget |
| P3 | Sprocket |
OrderLine (composite key: OrderID + ProductCode)
| OrderID | ProductCode | Qty |
|---|---|---|
| 501 | P1 | 3 |
| 501 | P2 | 1 |
| 502 | P1 | 1 |
| 503 | P3 | 2 |
Product titles are no longer repeated per order line. But the Order relation still has a problem: customer details repeat whenever a customer places more than one order (Alice's details appear on orders 501 and 503).
A relation is in 3NF if: it is already in 2NF and there are no transitive dependencies — no non-key attribute depends on another non-key attribute. Equivalently: every non-key attribute depends on the key, the whole key, and nothing but the key.
Identify the transitive dependencies in the Order relation (key: OrderID):
OrderID → CustomerID (direct) and CustomerID → CustomerName, CustomerCity — so OrderID → CustomerName through CustomerID. Transitive dependency (CustomerName depends on the non-key attribute CustomerID).CustomerCity → CityPostcodePrefix — the postcode prefix depends on the city, another non-key attribute. Transitive dependency.Applying 3NF — remove each transitive dependency into its own relation:
Customer (key: CustomerID)
| CustomerID | CustomerName | CustomerCity |
|---|---|---|
| C20 | Alice Chen | Leeds |
| C21 | Bob Patel | York |
City (key: CustomerCity)
| CustomerCity | CityPostcodePrefix |
|---|---|
| Leeds | LS |
| York | YO |
Order (key: OrderID) — now references Customer by foreign key
| OrderID | OrderDate | CustomerID |
|---|---|---|
| 501 | 2024-03-01 | C20 |
| 502 | 2024-03-02 | C21 |
| 503 | 2024-03-02 | C20 |
(Product and OrderLine from the 2NF step are unchanged and already in 3NF.)
Now every fact is stored exactly once. Change Alice's city in one place; correct a product title in one place; the anomalies are gone.
| Normal form | Rule (in addition to the previous form) | What it eliminates |
|---|---|---|
| 1NF | Atomic values, no repeating groups, a primary key exists | Repeating groups / multi-valued fields |
| 2NF | No partial dependencies (non-key attributes depend on the whole composite key) | Partial dependencies on part of a composite key |
| 3NF | No transitive dependencies (no non-key attribute depends on another non-key attribute) | Transitive (indirect) dependencies |
"The key (1NF), the whole key (2NF), and nothing but the key (3NF) — so help me Codd."
| Advantages | Disadvantages |
|---|---|
| Eliminates redundancy → less storage | More tables → schema is harder to grasp at a glance |
| Removes update, insertion and deletion anomalies | Queries need joins, which add processing cost |
| Improves data integrity and consistency | Read-heavy reporting systems may run slower |
| Easier, safer maintenance (change a fact once) | Sometimes deliberately de-normalised for performance |
This is why some high-read systems (e.g. data warehouses) deliberately de-normalise — accepting controlled redundancy in exchange for fewer joins — a trade-off you should be ready to discuss.
A clear way to reason about normalisation is to write out the functional dependencies of a relation as a list, with each determinant on the left. For the un-normalised OrderLine of our carried-through example the dependencies were:
Reading this list tells you immediately what to do: the partial dependencies are removed at 2NF (each determinant that is part of the key becomes its own relation), and the transitive dependencies are removed at 3NF (each non-key determinant becomes its own relation). In effect, 3NF says: every determinant should be a candidate key of its relation. If you can find a determinant that is neither the whole key nor a candidate key, the relation is not yet fully normalised.
This determinant-listing technique is the fastest reliable method in an exam: list the dependencies, classify each as full / partial / transitive, and the decomposition almost writes itself.
Exam questions sometimes give you a relation and ask "what is the highest normal form this satisfies, and why?" Consider:
Booking(BookingRef, RoomNumber, RoomType, GuestName, GuestEmail)
with the dependencies:
Is it in 1NF? Yes — all values are atomic, no repeating groups, and BookingRef is a single-attribute primary key.
Is it in 2NF? Yes — the primary key is a single attribute, so no partial dependency is possible.
Is it in 3NF? No. There are two transitive dependencies: BookingRef → RoomNumber → RoomType (RoomType depends on the non-key attribute RoomNumber) and BookingRef → GuestEmail → GuestName (GuestName depends on the non-key attribute GuestEmail).
So the relation is in 2NF but not 3NF. To reach 3NF:
The skill being tested is recognising that a single-attribute key makes 2NF automatic, then spotting non-key→non-key dependencies for 3NF. Always justify your answer with the specific dependency that does (or does not) violate the form.
Normalisation and entity-relationship modelling are two complementary routes to a good schema:
They should agree. If you E-R model a problem and then normalise the resulting relations and find further decomposition is needed, that is a hint your diagram missed an entity. In our carried-through example, the City relation that emerged at 3NF corresponds to a "City" entity that a hasty E-R diagram might have overlooked. Using both techniques as a cross-check is exactly what professional database designers do.
To consolidate the full UNF → 3NF journey on a different scenario, consider a raw log a streaming service keeps of which tracks a user played:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.