You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Normalisation is the engineering discipline that turns a messy, redundant table into a clean set of related tables that store each fact exactly once. It is the formal answer to the anomalies you met when comparing flat files with relational databases: rather than splitting tables by eye, normalisation gives you a procedure — a sequence of normal forms (1NF, 2NF, 3NF) — and a test at each stage so you can prove your design is sound. This lesson takes a single unnormalised table all the way to 3NF, showing every intermediate table, and pins down the functional-dependency reasoning that justifies each split.
This lesson addresses the H446 1.3.2 Databases content on normalisation:
(Phrasing here paraphrases the specification content; it is not a verbatim quote.)
Normalisation exists to kill three specific bugs that infest redundant tables. They are worth stating crisply because OCR examines them by name.
| Anomaly | What it is | Why redundancy causes it |
|---|---|---|
| Insertion anomaly | You cannot record one fact without also supplying an unrelated one. | If teacher details live on the student row, you cannot add a teacher who has no students yet — there is no row to hold them. |
| Deletion anomaly | Removing one fact silently destroys another. | Deleting the last student of a subject deletes the only copy of that subject and its teacher. |
| Update anomaly | One logical change requires editing many rows, and any miss leaves the data contradicting itself. | If a teacher's email is repeated on 30 rows, changing it means 30 edits; miss one and the database now holds two emails for one teacher. |
Every one of these is a symptom of the same root cause — a single table storing facts about more than one entity, so facts get duplicated. Normalisation removes the duplication, and the anomalies go with it. So the goals are:
Normalisation proceeds through a ladder of normal forms, each adding one rule on top of the previous.
Normal forms are defined in terms of functional dependencies, so this is the key idea to master first.
A functional dependency exists when the value of one attribute (or set of attributes) determines the value of another. We write A → B and read it "A determines B", meaning: if you know A, the value of B is fixed.
Example: StudentID → StudentName. Knowing a student's id fixes their name — there is exactly one name for id 001. The reverse is not a dependency: a name does not fix an id (two students could be called Alice), so StudentName → StudentID does not hold.
| Notation | Meaning |
|---|---|
A → B | A functionally determines B (knowing A fixes B) |
A → B, C | A determines both B and C |
A, B → C | Only the combination of A and B determines C |
Two flavours of dependency matter for the normal forms, and naming them is the whole game:
key → A → B. (3NF removes these.)If you can identify the functional dependencies of a table, the normal forms become almost mechanical.
Unnormalised data typically shows one or more of:
| StudentID | Name | Tutor | TutorRoom | Subject1 | Grade1 | Subject2 | Grade2 | Subject3 | Grade3 |
|---|---|---|---|---|---|---|---|---|---|
| 001 | Alice | Mr Khan | R12 | Maths | A | English | B | Science | A |
| 002 | Bob | Mr Khan | R12 | Maths | C | English | B | ||
| 003 | Charlie | Ms Bell | R09 | English | A | Science | B | History | C |
This single table will be our worked example for the entire lesson. Note three faults straight away: the repeating group (Subjectn/Graden), the empty cells when a student takes fewer subjects, and the redundancy in the tutor data (Mr Khan / R12 is stored twice). Each fault is removed by a different normal form, which is exactly why we proceed in stages.
The three anomalies are not abstract here — you can see each one in this exact table:
Hold these three concrete examples in mind: by the time the table reaches 3NF, all three will have become impossible, and that is the whole point of the exercise.
A table is in 1NF when:
The repeating group is removed by creating one row per subject rather than columns Subject1/Subject2/Subject3:
| StudentID | Name | Tutor | TutorRoom | Subject | Grade |
|---|---|---|---|---|---|
| 001 | Alice | Mr Khan | R12 | Maths | A |
| 001 | Alice | Mr Khan | R12 | English | B |
| 001 | Alice | Mr Khan | R12 | Science | A |
| 002 | Bob | Mr Khan | R12 | Maths | C |
| 002 | Bob | Mr Khan | R12 | English | B |
| 003 | Charlie | Ms Bell | R09 | English | A |
| 003 | Charlie | Ms Bell | R09 | Science | B |
| 003 | Charlie | Ms Bell | R09 | History | C |
Primary key: the composite (StudentID, Subject). Neither alone is unique — student 001 appears three times, subject Maths appears twice — but the pair identifies exactly one row (student 001's Maths result).
The table is now in 1NF: atomic values, no repeating groups, a defined key. But look at the redundancy it has introduced: Name, Tutor and TutorRoom are now repeated on every one of a student's rows. Change Alice's name and three rows need editing — an update anomaly. That redundancy is the cue for 2NF.
A table is in 2NF when:
A partial dependency only can exist when the primary key is composite; a table with a single-attribute key is automatically in 2NF if it is in 1NF.
Our key is (StudentID, Subject). Test each non-key attribute against it:
| Dependency | Type | Reasoning |
|---|---|---|
(StudentID, Subject) → Grade | full | The grade needs both — student 001's grade differs by subject. |
StudentID → Name | partial | Name depends on the student alone, not the subject. |
StudentID → Tutor, TutorRoom | partial | The tutor is fixed by the student, ignoring the subject. |
Name, Tutor and TutorRoom depend on only part of the key (StudentID). That is the partial dependency 2NF outlaws.
Remove each partial dependency into its own table, keyed on the part of the key it depended on:
Student (key StudentID):
| StudentID (PK) | Name | Tutor | TutorRoom |
|---|---|---|---|
| 001 | Alice | Mr Khan | R12 |
| 002 | Bob | Mr Khan | R12 |
| 003 | Charlie | Ms Bell | R09 |
Result (composite key (StudentID, Subject); StudentID is a foreign key → Student):
| StudentID (PK, FK) | Subject (PK) | Grade |
|---|---|---|
| 001 | Maths | A |
| 001 | English | B |
| 001 | Science | A |
| 002 | Maths | C |
| 002 | English | B |
| 003 | English | A |
| 003 | Science | B |
| 003 | History | C |
Now each student's name and tutor are stored once, in Student. The grade — which genuinely needs both the student and the subject — stays in Result. The 1NF update anomaly on Name is gone.
But the Student table still hides a fault: Tutor and TutorRoom travel together, and the room is really a fact about the tutor, not the student. Mr Khan / R12 is still duplicated (rows 001 and 002). That is a transitive dependency, and it is 3NF's job.
A table is in 3NF when:
A transitive dependency has the shape key → A → B where A is a non-key attribute: the key determines A, and A in turn determines B, so B depends on the key only indirectly.
In the Student table the dependencies are:
StudentID → Tutor (the student fixes the tutor), andTutor → TutorRoom (the tutor fixes the room — Mr Khan is always in R12).Chaining these gives StudentID → Tutor → TutorRoom: TutorRoom depends on StudentID only through Tutor. That is a transitive dependency, and it is why R12 is duplicated wherever Mr Khan appears.
Split the transitively dependent attribute(s) into a table keyed on the non-key attribute they really depend on (Tutor):
Student (key StudentID; Tutor is now a foreign key → Tutor):
| StudentID (PK) | Name | Tutor (FK) |
|---|---|---|
| 001 | Alice | Mr Khan |
| 002 | Bob | Mr Khan |
| 003 | Charlie | Ms Bell |
Tutor (key Tutor):
| Tutor (PK) | TutorRoom |
|---|---|
| Mr Khan | R12 |
| Ms Bell | R09 |
Result (unchanged — already in 3NF):
| StudentID (PK, FK) | Subject (PK) | Grade |
|---|---|---|
| 001 | Maths | A |
| … | … | … |
Now Mr Khan's room is stored once. If he moves to R15, one edit fixes it everywhere. The original ten-column UNF table has become three clean tables — Student, Tutor, Result — each describing one entity, each fact stored once, and the whole design in 3NF. The informal three-letter summary students memorise:
A neat mnemonic for 3NF: every non-key attribute must depend on the key, the whole key, and nothing but the key.
To prove the method is general, here is a different unnormalised table — a shop's orders — taken straight to 3NF.
| OrderID | CustomerID | CustomerName | CustomerEmail | Product | Qty |
|---|---|---|---|---|---|
| 1001 | C01 | Alice | alice@mail.com | Pen | 5 |
| 1001 | C01 | Alice | alice@mail.com | Pad | 2 |
| 1002 | C02 | Bob | bob@mail.com | Pen | 1 |
1NF: values are atomic; the key is the composite (OrderID, Product).
2NF: test for partial dependencies on (OrderID, Product):
(OrderID, Product) → Qty — full (the quantity needs both).OrderID → CustomerID, CustomerName, CustomerEmail — partial (the customer is fixed by the order alone).Split out the partial dependency:
Order: OrderID (PK), CustomerID, CustomerName, CustomerEmail.
OrderLine: (OrderID FK, Product) PK, Qty.
3NF: inside Order, OrderID → CustomerID → CustomerName, CustomerEmail is a transitive dependency (the customer's name and email depend on the customer, not directly on the order). Split it:
Order: OrderID (PK), CustomerID (FK).
Customer: CustomerID (PK), CustomerName, CustomerEmail.
OrderLine: (OrderID FK, Product) PK, Qty.
Customer details are now stored once however many orders Alice places — the same shape of result as the exam-results example, reached by the same two tests.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.