Normalisation

This lesson covers database normalisation for the OCR A-Level Computer Science (H446) specification. Normalisation is the process of organising a relational database to reduce redundancy and prevent anomalies.

What is Normalisation?

Normalisation is a systematic process of organising the attributes and tables of a relational database to:

Reduce data redundancy (storing the same data in multiple places).
Eliminate insertion, update, and deletion anomalies.
Ensure data integrity and consistency.
Simplify queries and maintenance.

Normalisation is performed through a series of normal forms, each building on the previous one.

Functional Dependencies

Before understanding normal forms, you need to understand functional dependencies.

A functional dependency exists when the value of one attribute (or set of attributes) determines the value of another attribute.

Written as: A -> B (A determines B)

Example: StudentID -> StudentName (knowing the StudentID uniquely determines the StudentName).

Notation	Meaning
A -> B	A functionally determines B
A -> B, C	A determines both B and C
A, B -> C	The combination of A and B determines C

Unnormalised Form (UNF)

Unnormalised data may contain:

Repeating groups (multiple values in a single field).
Redundant data.
No defined primary key.

Example: Student Exam Results (UNF)

StudentID	Name	Subject1	Grade1	Subject2	Grade2	Subject3	Grade3
001	Alice	Maths	A	English	B	Science	A
002	Bob	Maths	C	English	B
003	Charlie	English	A	Science	B	History	C

Problems:

Repeating groups (Subject1/Grade1, Subject2/Grade2, etc.)
Empty fields when a student takes fewer subjects.
Cannot easily add a fourth subject.

First Normal Form (1NF)

A table is in 1NF if:

All fields contain atomic values (no repeating groups or multiple values in a single field).
Each record is unique (has a primary key).
All entries in a column are of the same data type.

Conversion to 1NF

Remove repeating groups by creating one row per subject:

StudentID	Name	Subject	Grade
001	Alice	Maths	A
001	Alice	English	B
001	Alice	Science	A
002	Bob	Maths	C
002	Bob	English	B
003	Charlie	English	A
003	Charlie	Science	B
003	Charlie	History	C

Composite primary key: (StudentID, Subject)

The data is now in 1NF: no repeating groups, atomic values, unique records identified by the composite key.

Remaining problems: Name is repeated for each subject (redundancy). If Alice changes her name, every row must be updated (update anomaly).

Second Normal Form (2NF)

A table is in 2NF if:

It is already in 1NF.
All non-key attributes are fully functionally dependent on the entire primary key (no partial dependencies).

Partial dependency: A non-key attribute depends on only PART of a composite primary key.

In our 1NF table:

(StudentID, Subject) -> Grade (full dependency -- both parts needed)
StudentID -> Name (partial dependency -- Name depends only on StudentID, not Subject)

Conversion to 2NF

Remove partial dependencies by splitting into separate tables:

Students table:

StudentID	Name
001	Alice
002	Bob
003	Charlie

Primary key: StudentID

Results table:

Normalisation

Normalisation

What is Normalisation?

Functional Dependencies

Unnormalised Form (UNF)

Example: Student Exam Results (UNF)

First Normal Form (1NF)

Conversion to 1NF

Second Normal Form (2NF)

Conversion to 2NF

More in Computer Science