Relational Database Concepts

A relational database organises data into tables (formally called relations) and represents the connections between them using shared values rather than physical pointers. Each table stores data about a single entity — a class of real-world things or events such as students, courses, or enrolments. The relational model, proposed by E. F. Codd in 1970, is built on a simple but powerful idea: all data is held as values in tables, and relationships are expressed by matching those values (keys). This lesson establishes the vocabulary and structural rules that the rest of the database unit depends on — get the terminology and the key concepts exactly right here and every later topic (normalisation, SQL, transactions) becomes far easier.

Spec Mapping

This lesson addresses the AQA A-Level Computer Science (7517) Fundamentals of databases content, specifically the relational data model: relation/table, tuple/record, attribute/field and domain; the role of primary, composite, foreign, candidate and secondary keys; referential integrity; and the contrast between a relational design and a single flat file. It provides the foundation for the entity-relationship modelling, normalisation and SQL sub-topics that follow in the same area of the specification.

Flat Files vs. the Relational Model

Before relational databases, data was often stored in a single flat file — one large table holding everything. Suppose a school records every enrolment in one file:

StudentID	StudentName	TutorName	CourseCode	CourseTitle	RoomNo
1001	Alice Chen	Mrs Jones	CS101	Intro to Computing	IT1
1001	Alice Chen	Mrs Jones	MA210	Pure Mathematics	M4
1002	Bob Patel	Mr Smith	CS101	Intro to Computing	IT1

This single flat file suffers from serious problems:

Redundancy: "Alice Chen" and "Intro to Computing" are repeated on every row that mentions them, wasting space.
Update anomaly: if Intro to Computing moves to room IT2, you must edit every row mentioning it; miss one and the data becomes inconsistent.
Insertion anomaly: you cannot record a brand-new course until at least one student enrols on it, because there is no row to put it on.
Deletion anomaly: if Bob Patel withdraws and his is the only row mentioning room IT1 for CS101, deleting it loses the room information entirely.

The relational model solves all of these by separating each entity into its own table and linking them with keys. The same data, relationally modelled, becomes three small tables (Students, Courses, Enrolments) where each fact is stored exactly once. The whole discipline of normalisation (covered later in this unit) is the formal procedure for achieving this separation.

Key Terminology

The specification uses both the formal relational terms and their everyday equivalents. You should be able to use either set and recognise that they are synonyms.

Formal term	Everyday term	Definition
Relation	Table	A structured collection of data about one entity, organised into rows and columns
Tuple	Record / row	A single entry in a relation representing one instance of the entity
Attribute	Field / column	A single named property stored for each tuple — e.g. `FirstName`, `DateOfBirth`
Domain	—	The set of all permitted values an attribute may take (its data type plus any further constraints)
Degree	—	The number of attributes (columns) in a relation
Cardinality	—	The number of tuples (rows) currently in a relation

A relation is formally written in the notation Relation(PrimaryKey, attribute2, attribute3, …) with the primary key underlined. For example:

Student(StudentID, FirstName, Surname, DateOfBirth, FormGroup)

Domains

The domain of an attribute is the pool of legal values it may hold. FormGroup might have the domain {10A, 10B, 10C}; Score might have the domain "integers from 0 to 100". Domains are important because the database can reject any value outside the domain, protecting data integrity. A data type such as INTEGER defines a broad domain; a CHECK constraint narrows it further.

Tables, Records, and Fields in Practice

Consider a simple Students relation:

StudentID	FirstName	Surname	DateOfBirth	FormGroup
1001	Alice	Chen	2007-09-15	10A
1002	Bob	Patel	2007-03-22	10B
1003	Carol	Smith	2008-01-10	10A

The relation is called Students. Its degree is 5 (five attributes) and its cardinality is 3 (three tuples).
Each tuple is one record — the row for Alice Chen is a single tuple.
Each attribute is a column — FirstName, Surname, DateOfBirth and FormGroup are all attributes.
StudentID is the primary key because it uniquely identifies each student.

A key property of the relational model is that the order of rows is not significant and the order of columns is not significant. The data means the same thing however it is physically stored; you address data by name (attribute) and by key value, never by position.

Data Types and Domains

Every attribute has a data type that defines its base domain:

Data Type	Example Use	Example Value
`INTEGER`	StudentID	1001
`VARCHAR(n)`	FirstName (up to n characters)	'Alice'
`CHAR(n)`	FormGroup (fixed length)	'10A'
`DATE`	DateOfBirth	'2007-09-15'
`BOOLEAN`	IsActive	TRUE
`REAL` / `FLOAT`	AverageScore	72.5

Choosing the correct data type matters for data integrity (a DATE field cannot hold "banana"), for storage efficiency (an INTEGER uses less space than text), and for valid operations (you can perform arithmetic on numbers, not on strings).

Keys in Depth

Keys are the mechanism by which the relational model identifies tuples and links relations. The specification distinguishes several kinds, and exam questions frequently test whether you can tell them apart.

Primary Key

A primary key is the attribute (or set of attributes) chosen to uniquely identify every tuple in a relation. It must satisfy three properties:

Unique — no two tuples may share the same primary key value.
Not null — every tuple must have a value (this is called entity integrity).
Stable — ideally the value should never change, because other tables refer to it.

Candidate Key

A candidate key is any attribute or combination of attributes that could serve as the primary key — i.e. any minimal set of attributes guaranteeing uniqueness. A relation may have several candidate keys. For a Students relation we might have two candidate keys: StudentID and Email (assuming every student has a unique email). The designer selects one candidate key to act as the primary key; the others are then sometimes called alternate keys. "Minimal" is important — a candidate key contains no unnecessary attributes; if you can remove an attribute and still guarantee uniqueness, the original set was not a candidate key.

Surrogate vs. Natural Keys

A natural key is drawn from real-world data (e.g. National Insurance number, ISBN).
A surrogate key is an artificial, system-generated value with no meaning outside the database (e.g. an auto-incrementing StudentID).

Surrogate keys are usually preferred because natural data can change (people change names), can be reused, or can contain unexpected duplicates, all of which would break the "unique, not null, stable" requirement.

Composite Key

Sometimes no single attribute is unique, and a primary key must be formed by combining two or more attributes. This is a composite key. Consider a relation recording which students take which modules:

StudentID	ModuleID	EnrolmentDate
1001	CS101	2024-09-01
1001	CS102	2024-09-01
1002	CS101	2024-09-01

Here StudentID repeats (student 1001 takes two modules) and ModuleID repeats (CS101 is taken by two students), so neither alone is unique. The composite primary key is (StudentID, ModuleID) — the combination is unique. Composite keys are central to resolving many-to-many relationships and to understanding Second Normal Form.

Foreign Key

A foreign key is an attribute in one relation that references the primary key of another relation, creating a link between them. The foreign key value in the "child" table must match an existing primary key value in the "parent" table (or be null where permitted). The direction matters: the foreign key lives in the table on the many side of a one-to-many relationship.

Secondary Key

A secondary key is an attribute that is indexed to speed up searching but is not the primary key and need not be unique. If users frequently search students by surname, the DBMS can build an index on Surname so that "find all students called Patel" does not require scanning every row. This is a direct synoptic link to indexing and hashing.

Indexing and the cost of searching

Why does a secondary key need an index? Without one, finding every student with a given surname requires a linear search of the whole table — O(n) comparisons, where n is the number of tuples. For a table of a million rows that is a million comparisons every time.

An index is a separate, ordered structure that maps each value of the indexed attribute to the location(s) of the matching tuple(s). Two common implementations are:

Index structure	Lookup behaviour	Notes
Balanced tree (B-tree)	O(log n)	Keeps keys in sorted order; also supports range queries (e.g. surnames A–C)
Hash table	~O(1) average	Excellent for exact-match lookups; cannot answer range queries efficiently

So indexing the secondary key Surname turns "find all Patels" from an O(n) scan into an O(log n) tree descent or an O(1) hash lookup — exactly the data-structures content from elsewhere in the specification, applied to databases. The trade-off is that every index must itself be updated whenever a row is inserted, updated or deleted, and each index consumes extra storage. This is why a designer indexes the few attributes that are searched often (secondary keys) rather than every column.

The primary key is automatically indexed by virtually every DBMS, because referential integrity checks and joins constantly look rows up by their primary key; that index is what makes a foreign-key match fast.

Foreign Keys and Relationships

A foreign key creates a relationship between two relations. Consider:

Departments

DeptID	DeptName
D1	Computer Science
D2	Mathematics

Teachers

TeacherID	Name	DeptID
T1	Dr Williams	D1
T2	Ms Johnson	D2
T3	Mr Lee	D1

In Teachers, DeptID is a foreign key referencing Departments.DeptID. This means:

Each teacher belongs to exactly one department (the foreign key holds a single value).
One department can have many teachers (D1 appears against both T1 and T3) — this is a one-to-many relationship.
The DBMS enforces referential integrity: you cannot assign a teacher to department "D9" because no such department exists.

Types of Relationship

Relationship	Description	Example
One-to-one (1:1)	Each tuple in A relates to at most one tuple in B and vice versa	Person ↔ Passport
One-to-many (1:M)	One tuple in A relates to many tuples in B; each B relates to one A	Department → Teachers
Many-to-many (M:N)	Many tuples in A relate to many tuples in B	Students ↔ Modules

A many-to-many relationship cannot be represented directly in the relational model, because a single foreign key column can only point at one parent row. It must be decomposed using a junction table (also called a link, linking, or bridge table) that holds a foreign key to each of the two original tables. The junction table's primary key is typically the composite of those two foreign keys.

Referential Integrity

Referential integrity is the rule that every foreign key value must either match an existing primary key value in the referenced table or be null (where nulls are allowed). It keeps relationships consistent. The DBMS enforces it by preventing:

Inserting a tuple whose foreign key value has no matching primary key (e.g. an enrolment for a non-existent course).
Deleting a parent tuple while child tuples still reference it (e.g. deleting a course while students are enrolled), unless a defined cascade rule deletes or nulls the children automatically.

This is distinct from entity integrity (the rule that primary keys are unique and not null). Together, entity integrity and referential integrity are the two core integrity rules of the relational model.

Worked Example — Designing Relations for a School

Scenario: A school must store data about students, the courses on offer, and which students are enrolled on which courses. A student may take many courses; a course may have many students.

Step 1 — Identify entities: Student, Course, Enrolment.

Step 2 — Recognise the many-to-many: Student ↔ Course is many-to-many, so it requires a junction table. Enrolment is that junction table.

Step 3 — Define the relations (primary keys in bold):

Student(StudentID, FirstName, Surname, Email)
Course(CourseID, CourseName, TeacherID)
Enrolment(StudentID, CourseID, EnrolmentDate, Grade)

In Enrolment, (StudentID, CourseID) is the composite primary key that resolves the many-to-many relationship. Both StudentID and CourseID are also foreign keys, referencing Student and Course respectively — a single attribute can be part of the primary key and a foreign key at the same time. TeacherID in Course is a foreign key to a (separate) Teacher relation.

This three-table design stores each student once, each course once, and each enrolment fact once — eliminating the redundancy of the flat file we started with.

The Two Integrity Rules Compared

The relational model rests on two complementary integrity rules. Knowing which is which is a frequent short-answer question.

Rule	What it guarantees	Enforced by
Entity integrity	Every tuple has a primary-key value that is unique and not null	The primary-key constraint
Referential integrity	Every foreign-key value matches an existing primary key in the referenced relation (or is null where permitted)	The foreign-key constraint

A neat way to remember the distinction: entity integrity is about a row being able to identify itself (it has a proper key); referential integrity is about a row's reference to another row being valid (the thing it points at really exists). Both are examples of the broader idea of data integrity — keeping data accurate and consistent — which also includes domain constraints (a value lies within its attribute's domain) and is the database-level expression of the Consistency guarantee in the ACID model.

The Role of the DBMS

A database is the structured collection of data; a Database Management System (DBMS) is the software that sits between users (or application programs) and the data, providing the services that make a relational database usable and safe. Key responsibilities of a DBMS include:

Enforcing the schema and constraints — rejecting data that violates types, domains, entity integrity or referential integrity.
Processing queries — accepting declarative SQL and choosing an efficient execution plan (using indexes where available).
Managing concurrency — allowing many users to read and write simultaneously without corrupting data, via the transaction/locking mechanisms covered later.
Providing security — authenticating users and granting or revoking privileges on tables.
Ensuring recovery — restoring a consistent state after a crash, supporting the Durability guarantee.

A crucial benefit the DBMS provides is data independence: applications interact with the logical schema (tables and columns) and are insulated from how the data is physically stored on disk. The storage engine can change, indexes can be added or dropped, and existing queries keep working — something a hand-coded flat-file system cannot offer.

Relational vs. Other Data Models

The relational model is dominant but not the only approach; knowing the contrast sharpens your understanding of why the relational rules exist.

Model	How relationships are represented	Strengths	Weaknesses
Flat file	None — all data in one table	Simple; no software needed	Redundancy and anomalies (as shown above)
Relational	Shared key values (foreign keys)	Integrity, flexible ad-hoc queries (SQL), no redundancy	Joins cost time; rigid schema
Document / NoSQL	Nested documents; references by ID	Horizontal scalability; flexible schema	Weaker integrity guarantees; limited joins

The relational model's insistence on values, not pointers is exactly what enables the powerful, declarative querying of SQL and the strong integrity guarantees — at the price of needing joins to reassemble normalised data. Every design decision in this unit is a point on that trade-off.

A Second Worked Example — A Library

Scenario: A library lends copies of books to members. A member may borrow many books over time; a particular physical copy is on loan to at most one member at a time, but over its life it is loaned to many members.

Entities: Member, Book (the title), Copy (a physical copy of a book), Loan.

Relations (primary keys in bold, foreign keys marked):

Member(MemberID, Name, Email, JoinDate)
Book(ISBN, Title, Author)
Copy(CopyID, ISBN*) — *foreign key to Book; one book title has many physical copies (1:M)
Loan(LoanID, CopyID*, MemberID*, LoanDate, ReturnDate) — *both foreign keys; resolves the Member ↔ Copy many-to-many

Points of interest:

ISBN is a natural key for Book (it is genuinely unique per title and externally assigned), whereas CopyID, MemberID and LoanID are surrogate keys.
Email is a candidate key of Member (assuming uniqueness) but MemberID was chosen as the primary key because it is guaranteed stable.
Surname-style searching of members would be accelerated by a secondary key / index on a name attribute.
Referential integrity prevents recording a Loan for a CopyID or MemberID that does not exist, and prevents deleting a Member while their Loans remain — protecting against "orphan" loan rows.

This single example therefore exercises every key concept in the lesson: relations, tuples and attributes; primary, candidate, foreign, composite (via Loan if dated multiply), natural vs surrogate, and secondary keys; and both integrity rules.

Choosing a Primary Key — A Worked Decision

Given a relation, which attribute should be the primary key? Work through the candidates against the three properties (unique, not null, stable). For a Member relation with attributes MemberID, Email, Phone, Name:

Candidate	Unique?	Never null?	Stable?	Verdict
Email	Likely	Maybe not (some have none)	No (people change email)	Reject — nullable and changeable
Phone	Not guaranteed (shared lines)	Maybe not	No	Reject — not reliably unique
Name	No (duplicates)	Yes	No	Reject — not unique
MemberID (surrogate)	Yes (system-assigned)	Yes (always set)	Yes (never reused/changed)	Choose

Email and Phone are at best candidate keys with weaknesses; Name is not even a candidate key. The surrogate MemberID wins precisely because it is engineered to satisfy all three properties. This is the reasoning examiners expect when a question asks you to justify a choice of primary key — not merely "MemberID is unique" but a comparison that rejects the alternatives for concrete reasons.

A related point: because other relations will hold MemberID as a foreign key, its stability is doubly important — if the primary key value changed, every foreign-key reference to it would have to change too, which is exactly the brittleness that natural keys risk and surrogate keys avoid.

Synoptic Links

Secondary keys and indexing (Fundamentals of data structures): a secondary key is implemented as an index. Indexes are frequently built using hash tables or balanced trees, linking directly to the data-structures unit. A hash table gives near O(1) lookup by key, which is exactly why indexing a secondary key accelerates searches.
Primary/foreign keys ↔ normalisation: correctly identified keys are a prerequisite for normalisation; partial and transitive dependencies are defined relative to the primary key.
Referential integrity ↔ transactions and the ACID model: the Consistency guarantee in ACID is precisely the promise that integrity constraints (including referential integrity) hold before and after every transaction.
Relational model ↔ Boolean logic and sets: a relation is mathematically a set of tuples; selection and projection in relational algebra map onto SQL's WHERE and column lists, connecting to set theory and logic elsewhere in the specification.

Common Misconceptions

"A foreign key points from the parent to the child." It is the reverse. The foreign key lives in the child (many-side) table and points up to the primary key in the parent (one-side) table.
"Primary key and foreign key are different kinds of column that can never overlap." A single attribute can be simultaneously part of a primary key and a foreign key — this is exactly what happens in a junction table such as Enrolment.
"A composite key means the table has two primary keys." No — it is one primary key that happens to be made up of two or more attributes combined. A relation has at most one primary key.
"Candidate key and primary key mean the same thing." Every primary key is a candidate key, but a relation may have several candidate keys; the primary key is the one chosen from among them.
"A secondary key must be unique." It need not be; it exists only to speed up searching and may have many matching rows.
"Tables must be stored in primary-key order." Row order is not significant in the relational model; logical access is by value, not position.

Specimen Question

A leisure centre records its members and the fitness classes they book. A member can book many classes, and a class can be booked by many members. Each class is led by one instructor, and an instructor can lead many classes.

(a) Identify the entities and state the relationship (with degree) between Member and Class. [2 marks]

(b) Explain why a junction table is required, and give a suitable relation (with a composite primary key) to resolve the Member–Class relationship. [4 marks]

(c) For your Booking relation, identify the foreign keys and explain how referential integrity protects the data. [3 marks]

AO breakdown: (a) is AO1 — recall of entities and relationship degree. (b) is AO2 — applying knowledge of junction tables and composite keys to this scenario. (c) is AO2/AO3 — applying the concept of referential integrity and reasoning about the protection it provides.

Mid-band response

The entities are Member, Class and Instructor. Member to Class is many-to-many. You cannot store a many-to-many relationship directly so you need a junction table called Booking. Booking has MemberID and ClassID in it. These are foreign keys. Referential integrity stops you adding a booking for a member who does not exist.

Stronger response

The entities are Member, Class and Instructor. The relationship between Member and Class is many-to-many (M:N), because one member books many classes and one class is booked by many members. A relational database cannot represent M:N directly, so we introduce a junction table:

Booking(MemberID, ClassID, BookingDate)

The composite primary key is (MemberID, ClassID). MemberID is a foreign key referencing Member, and ClassID is a foreign key referencing Class. Referential integrity ensures every MemberID and ClassID in Booking matches an existing member and class, so you cannot book a non-existent member onto a non-existent class.

Top-band response

Entities: Member, Class, Instructor.

Member–Class relationship: many-to-many (M:N) — one member books many classes and one class is booked by many members. A single foreign-key column can reference only one parent row, so M:N cannot be held directly; it must be decomposed into two one-to-many relationships via a junction (link) table.

Resolving relation:

Booking(MemberID, ClassID, BookingDate)

with composite primary key (MemberID, ClassID), which is both unique (a member books a given class at most once) and not null (entity integrity). MemberID is a foreign key referencing Member(MemberID); ClassID is a foreign key referencing Class(ClassID). The Class–Instructor relationship is one-to-many and is handled by placing InstructorID as a foreign key inside Class, on the many side.

Referential integrity guarantees that every foreign key value in Booking matches an existing primary key in its parent table. On insert, the DBMS rejects a booking that names a member or class that does not exist; on delete, it prevents (or cascades) removal of a member or class while bookings still reference them. This blocks "orphan" rows and keeps the relationships internally consistent.

Examiner-style commentary: The mid-band answer identifies the entities and the need for a junction table but is vague — it never gives the composite primary key explicitly, does not state the relationship degree using the M:N convention, and explains referential integrity only by a single one-directional example. The stronger answer adds the formal relation notation, names the composite key and both foreign keys, and covers integrity on insertion. The top-band answer earns full marks by additionally explaining why M:N cannot be stored directly (single foreign-key column), addressing referential integrity on both insertion and deletion (including cascade), invoking entity integrity for the composite key, and resolving the second relationship correctly. Precise, conventional terminology (M:N, composite primary key, entity vs referential integrity) is what separates the bands.

Entity Sets, Instances and the Closed-World Assumption

A final clarification of vocabulary that examiners value: an entity (or entity set) such as Student is the whole class of things, while an entity instance (or occurrence) such as "Alice Chen" is a single member of that class, stored as one tuple. The relation Students is the entity set; each row is an instance. Keeping these levels straight matters when reasoning about cardinality ("one instance of Department relates to many instances of Teacher").

Relational databases also operate under the closed-world assumption: any fact not present in the database is taken to be false. If there is no enrolment tuple linking student 1003 to course C1, the database concludes that student 1003 is not enrolled on C1 — it does not treat the absence as "unknown". This is distinct from a NULL value, which explicitly records that a particular attribute's value is unknown or not applicable. The interplay of these two ideas (absent rows = false; present-but-NULL attribute = unknown) underlies a great deal of careful querying.

Going Further

Relational algebra: SQL queries can be expressed mathematically using the operations selection (σ), projection (π), and join (⋈). Reading a relation as a set of tuples explains why DISTINCT exists and why duplicate rows are theoretically forbidden.
NoSQL contrast: investigate why document and key-value stores (e.g. MongoDB, Redis) deliberately abandon the rigid relational schema for horizontal scalability, and what they trade away (joins, strict integrity) to get it.
Indexes under the hood: explore how a B-tree index on a secondary key turns an O(n) table scan into an O(log n) lookup, and why too many indexes slow down inserts and updates.
Cascade options: read about ON DELETE CASCADE, ON DELETE SET NULL, and ON DELETE RESTRICT, and reason about which is appropriate for a Booking row when a Class is cancelled versus when a Member leaves.

This content is aligned with the AQA A-Level Computer Science (7517) specification.

Relational Database Concepts

Relational Database Concepts

Spec Mapping

Flat Files vs. the Relational Model

Key Terminology

Domains

Tables, Records, and Fields in Practice

Data Types and Domains

Keys in Depth

Primary Key

Candidate Key

Surrogate vs. Natural Keys

Composite Key

Foreign Key

Secondary Key

Indexing and the cost of searching

Foreign Keys and Relationships

Types of Relationship

Referential Integrity

Worked Example — Designing Relations for a School

The Two Integrity Rules Compared

The Role of the DBMS

Relational vs. Other Data Models

A Second Worked Example — A Library

Choosing a Primary Key — A Worked Decision

Synoptic Links

Common Misconceptions

Specimen Question

Mid-band response

Stronger response

Top-band response

Entity Sets, Instances and the Closed-World Assumption

Going Further

More in Computer Science