You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
A relational database organises data into tables (formally called relations) and represents the connections between them using shared values rather than physical pointers. Each table stores data about a single entity — a class of real-world things or events such as students, courses, or enrolments. The relational model, proposed by E. F. Codd in 1970, is built on a simple but powerful idea: all data is held as values in tables, and relationships are expressed by matching those values (keys). This lesson establishes the vocabulary and structural rules that the rest of the database unit depends on — get the terminology and the key concepts exactly right here and every later topic (normalisation, SQL, transactions) becomes far easier.
This lesson addresses the AQA A-Level Computer Science (7517) Fundamentals of databases content, specifically the relational data model: relation/table, tuple/record, attribute/field and domain; the role of primary, composite, foreign, candidate and secondary keys; referential integrity; and the contrast between a relational design and a single flat file. It provides the foundation for the entity-relationship modelling, normalisation and SQL sub-topics that follow in the same area of the specification.
Before relational databases, data was often stored in a single flat file — one large table holding everything. Suppose a school records every enrolment in one file:
| StudentID | StudentName | TutorName | CourseCode | CourseTitle | RoomNo |
|---|---|---|---|---|---|
| 1001 | Alice Chen | Mrs Jones | CS101 | Intro to Computing | IT1 |
| 1001 | Alice Chen | Mrs Jones | MA210 | Pure Mathematics | M4 |
| 1002 | Bob Patel | Mr Smith | CS101 | Intro to Computing | IT1 |
This single flat file suffers from serious problems:
The relational model solves all of these by separating each entity into its own table and linking them with keys. The same data, relationally modelled, becomes three small tables (Students, Courses, Enrolments) where each fact is stored exactly once. The whole discipline of normalisation (covered later in this unit) is the formal procedure for achieving this separation.
The specification uses both the formal relational terms and their everyday equivalents. You should be able to use either set and recognise that they are synonyms.
| Formal term | Everyday term | Definition |
|---|---|---|
| Relation | Table | A structured collection of data about one entity, organised into rows and columns |
| Tuple | Record / row | A single entry in a relation representing one instance of the entity |
| Attribute | Field / column | A single named property stored for each tuple — e.g. FirstName, DateOfBirth |
| Domain | — | The set of all permitted values an attribute may take (its data type plus any further constraints) |
| Degree | — | The number of attributes (columns) in a relation |
| Cardinality | — | The number of tuples (rows) currently in a relation |
A relation is formally written in the notation Relation(PrimaryKey, attribute2, attribute3, …) with the primary key underlined. For example:
Student(StudentID, FirstName, Surname, DateOfBirth, FormGroup)
The domain of an attribute is the pool of legal values it may hold. FormGroup might have the domain {10A, 10B, 10C}; Score might have the domain "integers from 0 to 100". Domains are important because the database can reject any value outside the domain, protecting data integrity. A data type such as INTEGER defines a broad domain; a CHECK constraint narrows it further.
Consider a simple Students relation:
| StudentID | FirstName | Surname | DateOfBirth | FormGroup |
|---|---|---|---|---|
| 1001 | Alice | Chen | 2007-09-15 | 10A |
| 1002 | Bob | Patel | 2007-03-22 | 10B |
| 1003 | Carol | Smith | 2008-01-10 | 10A |
FirstName, Surname, DateOfBirth and FormGroup are all attributes.StudentID is the primary key because it uniquely identifies each student.A key property of the relational model is that the order of rows is not significant and the order of columns is not significant. The data means the same thing however it is physically stored; you address data by name (attribute) and by key value, never by position.
Every attribute has a data type that defines its base domain:
| Data Type | Example Use | Example Value |
|---|---|---|
INTEGER | StudentID | 1001 |
VARCHAR(n) | FirstName (up to n characters) | 'Alice' |
CHAR(n) | FormGroup (fixed length) | '10A' |
DATE | DateOfBirth | '2007-09-15' |
BOOLEAN | IsActive | TRUE |
REAL / FLOAT | AverageScore | 72.5 |
Choosing the correct data type matters for data integrity (a DATE field cannot hold "banana"), for storage efficiency (an INTEGER uses less space than text), and for valid operations (you can perform arithmetic on numbers, not on strings).
Keys are the mechanism by which the relational model identifies tuples and links relations. The specification distinguishes several kinds, and exam questions frequently test whether you can tell them apart.
A primary key is the attribute (or set of attributes) chosen to uniquely identify every tuple in a relation. It must satisfy three properties:
A candidate key is any attribute or combination of attributes that could serve as the primary key — i.e. any minimal set of attributes guaranteeing uniqueness. A relation may have several candidate keys. For a Students relation we might have two candidate keys: StudentID and Email (assuming every student has a unique email). The designer selects one candidate key to act as the primary key; the others are then sometimes called alternate keys. "Minimal" is important — a candidate key contains no unnecessary attributes; if you can remove an attribute and still guarantee uniqueness, the original set was not a candidate key.
StudentID).Surrogate keys are usually preferred because natural data can change (people change names), can be reused, or can contain unexpected duplicates, all of which would break the "unique, not null, stable" requirement.
Sometimes no single attribute is unique, and a primary key must be formed by combining two or more attributes. This is a composite key. Consider a relation recording which students take which modules:
| StudentID | ModuleID | EnrolmentDate |
|---|---|---|
| 1001 | CS101 | 2024-09-01 |
| 1001 | CS102 | 2024-09-01 |
| 1002 | CS101 | 2024-09-01 |
Here StudentID repeats (student 1001 takes two modules) and ModuleID repeats (CS101 is taken by two students), so neither alone is unique. The composite primary key is (StudentID, ModuleID) — the combination is unique. Composite keys are central to resolving many-to-many relationships and to understanding Second Normal Form.
A foreign key is an attribute in one relation that references the primary key of another relation, creating a link between them. The foreign key value in the "child" table must match an existing primary key value in the "parent" table (or be null where permitted). The direction matters: the foreign key lives in the table on the many side of a one-to-many relationship.
A secondary key is an attribute that is indexed to speed up searching but is not the primary key and need not be unique. If users frequently search students by surname, the DBMS can build an index on Surname so that "find all students called Patel" does not require scanning every row. This is a direct synoptic link to indexing and hashing.
Why does a secondary key need an index? Without one, finding every student with a given surname requires a linear search of the whole table — O(n) comparisons, where n is the number of tuples. For a table of a million rows that is a million comparisons every time.
An index is a separate, ordered structure that maps each value of the indexed attribute to the location(s) of the matching tuple(s). Two common implementations are:
| Index structure | Lookup behaviour | Notes |
|---|---|---|
| Balanced tree (B-tree) | O(log n) | Keeps keys in sorted order; also supports range queries (e.g. surnames A–C) |
| Hash table | ~O(1) average | Excellent for exact-match lookups; cannot answer range queries efficiently |
So indexing the secondary key Surname turns "find all Patels" from an O(n) scan into an O(log n) tree descent or an O(1) hash lookup — exactly the data-structures content from elsewhere in the specification, applied to databases. The trade-off is that every index must itself be updated whenever a row is inserted, updated or deleted, and each index consumes extra storage. This is why a designer indexes the few attributes that are searched often (secondary keys) rather than every column.
The primary key is automatically indexed by virtually every DBMS, because referential integrity checks and joins constantly look rows up by their primary key; that index is what makes a foreign-key match fast.
A foreign key creates a relationship between two relations. Consider:
Departments
| DeptID | DeptName |
|---|---|
| D1 | Computer Science |
| D2 | Mathematics |
Teachers
| TeacherID | Name | DeptID |
|---|---|---|
| T1 | Dr Williams | D1 |
| T2 | Ms Johnson | D2 |
| T3 | Mr Lee | D1 |
In Teachers, DeptID is a foreign key referencing Departments.DeptID. This means:
| Relationship | Description | Example |
|---|---|---|
| One-to-one (1:1) | Each tuple in A relates to at most one tuple in B and vice versa | Person ↔ Passport |
| One-to-many (1:M) | One tuple in A relates to many tuples in B; each B relates to one A | Department → Teachers |
| Many-to-many (M:N) | Many tuples in A relate to many tuples in B | Students ↔ Modules |
A many-to-many relationship cannot be represented directly in the relational model, because a single foreign key column can only point at one parent row. It must be decomposed using a junction table (also called a link, linking, or bridge table) that holds a foreign key to each of the two original tables. The junction table's primary key is typically the composite of those two foreign keys.
Referential integrity is the rule that every foreign key value must either match an existing primary key value in the referenced table or be null (where nulls are allowed). It keeps relationships consistent. The DBMS enforces it by preventing:
This is distinct from entity integrity (the rule that primary keys are unique and not null). Together, entity integrity and referential integrity are the two core integrity rules of the relational model.
Scenario: A school must store data about students, the courses on offer, and which students are enrolled on which courses. A student may take many courses; a course may have many students.
Step 1 — Identify entities: Student, Course, Enrolment.
Step 2 — Recognise the many-to-many: Student ↔ Course is many-to-many, so it requires a junction table. Enrolment is that junction table.
Step 3 — Define the relations (primary keys in bold):
In Enrolment, (StudentID, CourseID) is the composite primary key that resolves the many-to-many relationship. Both StudentID and CourseID are also foreign keys, referencing Student and Course respectively — a single attribute can be part of the primary key and a foreign key at the same time. TeacherID in Course is a foreign key to a (separate) Teacher relation.
This three-table design stores each student once, each course once, and each enrolment fact once — eliminating the redundancy of the flat file we started with.
The relational model rests on two complementary integrity rules. Knowing which is which is a frequent short-answer question.
| Rule | What it guarantees | Enforced by |
|---|---|---|
| Entity integrity | Every tuple has a primary-key value that is unique and not null | The primary-key constraint |
| Referential integrity | Every foreign-key value matches an existing primary key in the referenced relation (or is null where permitted) | The foreign-key constraint |
A neat way to remember the distinction: entity integrity is about a row being able to identify itself (it has a proper key); referential integrity is about a row's reference to another row being valid (the thing it points at really exists). Both are examples of the broader idea of data integrity — keeping data accurate and consistent — which also includes domain constraints (a value lies within its attribute's domain) and is the database-level expression of the Consistency guarantee in the ACID model.
A database is the structured collection of data; a Database Management System (DBMS) is the software that sits between users (or application programs) and the data, providing the services that make a relational database usable and safe. Key responsibilities of a DBMS include:
A crucial benefit the DBMS provides is data independence: applications interact with the logical schema (tables and columns) and are insulated from how the data is physically stored on disk. The storage engine can change, indexes can be added or dropped, and existing queries keep working — something a hand-coded flat-file system cannot offer.
The relational model is dominant but not the only approach; knowing the contrast sharpens your understanding of why the relational rules exist.
| Model | How relationships are represented | Strengths | Weaknesses |
|---|---|---|---|
| Flat file | None — all data in one table | Simple; no software needed | Redundancy and anomalies (as shown above) |
| Relational | Shared key values (foreign keys) | Integrity, flexible ad-hoc queries (SQL), no redundancy | Joins cost time; rigid schema |
| Document / NoSQL | Nested documents; references by ID | Horizontal scalability; flexible schema | Weaker integrity guarantees; limited joins |
The relational model's insistence on values, not pointers is exactly what enables the powerful, declarative querying of SQL and the strong integrity guarantees — at the price of needing joins to reassemble normalised data. Every design decision in this unit is a point on that trade-off.
Scenario: A library lends copies of books to members. A member may borrow many books over time; a particular physical copy is on loan to at most one member at a time, but over its life it is loaned to many members.
Entities: Member, Book (the title), Copy (a physical copy of a book), Loan.
Relations (primary keys in bold, foreign keys marked):
Points of interest:
ISBN is a natural key for Book (it is genuinely unique per title and externally assigned), whereas CopyID, MemberID and LoanID are surrogate keys.Email is a candidate key of Member (assuming uniqueness) but MemberID was chosen as the primary key because it is guaranteed stable.Surname-style searching of members would be accelerated by a secondary key / index on a name attribute.This single example therefore exercises every key concept in the lesson: relations, tuples and attributes; primary, candidate, foreign, composite (via Loan if dated multiply), natural vs surrogate, and secondary keys; and both integrity rules.
Given a relation, which attribute should be the primary key? Work through the candidates against the three properties (unique, not null, stable). For a Member relation with attributes MemberID, Email, Phone, Name:
| Candidate | Unique? | Never null? | Stable? | Verdict |
|---|---|---|---|---|
| Likely | Maybe not (some have none) | No (people change email) | Reject — nullable and changeable | |
| Phone | Not guaranteed (shared lines) | Maybe not | No | Reject — not reliably unique |
| Name | No (duplicates) | Yes | No | Reject — not unique |
| MemberID (surrogate) | Yes (system-assigned) | Yes (always set) | Yes (never reused/changed) | Choose |
Email and Phone are at best candidate keys with weaknesses; Name is not even a candidate key. The surrogate MemberID wins precisely because it is engineered to satisfy all three properties. This is the reasoning examiners expect when a question asks you to justify a choice of primary key — not merely "MemberID is unique" but a comparison that rejects the alternatives for concrete reasons.
A related point: because other relations will hold MemberID as a foreign key, its stability is doubly important — if the primary key value changed, every foreign-key reference to it would have to change too, which is exactly the brittleness that natural keys risk and surrogate keys avoid.
WHERE and column lists, connecting to set theory and logic elsewhere in the specification.A leisure centre records its members and the fitness classes they book. A member can book many classes, and a class can be booked by many members. Each class is led by one instructor, and an instructor can lead many classes.
(a) Identify the entities and state the relationship (with degree) between Member and Class. [2 marks]
(b) Explain why a junction table is required, and give a suitable relation (with a composite primary key) to resolve the Member–Class relationship. [4 marks]
(c) For your Booking relation, identify the foreign keys and explain how referential integrity protects the data. [3 marks]
AO breakdown: (a) is AO1 — recall of entities and relationship degree. (b) is AO2 — applying knowledge of junction tables and composite keys to this scenario. (c) is AO2/AO3 — applying the concept of referential integrity and reasoning about the protection it provides.
The entities are Member, Class and Instructor. Member to Class is many-to-many. You cannot store a many-to-many relationship directly so you need a junction table called Booking. Booking has MemberID and ClassID in it. These are foreign keys. Referential integrity stops you adding a booking for a member who does not exist.
The entities are Member, Class and Instructor. The relationship between Member and Class is many-to-many (M:N), because one member books many classes and one class is booked by many members. A relational database cannot represent M:N directly, so we introduce a junction table:
Booking(MemberID, ClassID, BookingDate)
The composite primary key is (MemberID, ClassID). MemberID is a foreign key referencing Member, and ClassID is a foreign key referencing Class. Referential integrity ensures every MemberID and ClassID in Booking matches an existing member and class, so you cannot book a non-existent member onto a non-existent class.
Entities: Member, Class, Instructor.
Member–Class relationship: many-to-many (M:N) — one member books many classes and one class is booked by many members. A single foreign-key column can reference only one parent row, so M:N cannot be held directly; it must be decomposed into two one-to-many relationships via a junction (link) table.
Resolving relation:
Booking(MemberID, ClassID, BookingDate)
with composite primary key (MemberID, ClassID), which is both unique (a member books a given class at most once) and not null (entity integrity). MemberID is a foreign key referencing Member(MemberID); ClassID is a foreign key referencing Class(ClassID). The Class–Instructor relationship is one-to-many and is handled by placing InstructorID as a foreign key inside Class, on the many side.
Referential integrity guarantees that every foreign key value in Booking matches an existing primary key in its parent table. On insert, the DBMS rejects a booking that names a member or class that does not exist; on delete, it prevents (or cascades) removal of a member or class while bookings still reference them. This blocks "orphan" rows and keeps the relationships internally consistent.
Examiner-style commentary: The mid-band answer identifies the entities and the need for a junction table but is vague — it never gives the composite primary key explicitly, does not state the relationship degree using the M:N convention, and explains referential integrity only by a single one-directional example. The stronger answer adds the formal relation notation, names the composite key and both foreign keys, and covers integrity on insertion. The top-band answer earns full marks by additionally explaining why M:N cannot be stored directly (single foreign-key column), addressing referential integrity on both insertion and deletion (including cascade), invoking entity integrity for the composite key, and resolving the second relationship correctly. Precise, conventional terminology (M:N, composite primary key, entity vs referential integrity) is what separates the bands.
A final clarification of vocabulary that examiners value: an entity (or entity set) such as Student is the whole class of things, while an entity instance (or occurrence) such as "Alice Chen" is a single member of that class, stored as one tuple. The relation Students is the entity set; each row is an instance. Keeping these levels straight matters when reasoning about cardinality ("one instance of Department relates to many instances of Teacher").
Relational databases also operate under the closed-world assumption: any fact not present in the database is taken to be false. If there is no enrolment tuple linking student 1003 to course C1, the database concludes that student 1003 is not enrolled on C1 — it does not treat the absence as "unknown". This is distinct from a NULL value, which explicitly records that a particular attribute's value is unknown or not applicable. The interplay of these two ideas (absent rows = false; present-but-NULL attribute = unknown) underlies a great deal of careful querying.
DISTINCT exists and why duplicate rows are theoretically forbidden.ON DELETE CASCADE, ON DELETE SET NULL, and ON DELETE RESTRICT, and reason about which is appropriate for a Booking row when a Class is cancelled versus when a Member leaves.This content is aligned with the AQA A-Level Computer Science (7517) specification.