You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Almost every non-trivial program eventually needs to remember things between runs, share them with other programs, and let several people read and change them at once. A database is the disciplined answer to that need: an organised, queryable, integrity-checked store of structured data. This lesson builds the conceptual vocabulary the rest of the 1.3.2 topic depends on — what an entity is, how records and fields are organised, what the different kinds of key do, and how relationships between entities are captured in an entity-relationship diagram. Get this language precise and normalisation, SQL and transactions all become far easier; get it vague and everything downstream wobbles.
This lesson addresses the H446 1.3.2 Databases content on database fundamentals:
(Phrasing here paraphrases the specification content; it is not a verbatim quote.)
Before the vocabulary, the motivation. You could store data in ordinary variables or a single text file, so what does a database buy you?
A relational database delivers all of these and tames redundancy. That last point is what separates it from a flat file, so we start there.
A flat file is a single table of data held in one file — typically CSV or TSV — with no notion of relationships between different kinds of thing. Everything you want to record about everything is crammed into one wide table.
Example flat file (students.csv):
| StudentID | Name | Subject | TeacherName | TeacherEmail |
|---|---|---|---|---|
| 001 | Alice | Maths | Mr Smith | smith@school.com |
| 002 | Bob | Maths | Mr Smith | smith@school.com |
| 003 | Charlie | English | Ms Jones | jones@school.com |
| 004 | Alice | English | Ms Jones | jones@school.com |
Notice what has happened. There are really three different kinds of thing tangled together here — students, subjects and teachers — but because there is only one table, facts about teachers are repeated on every row that mentions them. That single design decision causes a cascade of problems.
| Problem | What goes wrong | Concrete example above |
|---|---|---|
| Data redundancy | The same fact is stored many times. | Mr Smith's email is duplicated on rows 1 and 2. |
| Data inconsistency | A repeated fact gets changed in some copies but not others, so the file now disagrees with itself. | If Mr Smith's email changes and one row is missed, the file holds two different emails for one teacher. |
| Insertion anomaly | You cannot record one fact without inventing another. | You cannot add a new teacher until at least one student is enrolled with them — there is no row to put them on. |
| Deletion anomaly | Removing one fact accidentally destroys an unrelated one. | Deleting the last student of a subject also erases the only record of that subject and its teacher. |
| Update anomaly | A single logical change needs many physical edits, and any miss corrupts the data. | Renaming a teacher means editing every row they appear on. |
These last three — the insertion, deletion and update anomalies — are examined by name. They are not three separate misfortunes but three symptoms of one disease: storing facts about different entities in the same table. The cure is to split the entities apart, which is exactly what a relational database does and what normalisation (next lesson) formalises.
A relational database organises data into several related tables (formally, relations). Each table describes exactly one kind of entity, and tables are connected through keys rather than by duplicating data. The same information as above, split sensibly:
Students
| StudentID (PK) | Name |
|---|---|
| 001 | Alice |
| 002 | Bob |
| 003 | Charlie |
Subjects
| SubjectID (PK) | SubjectName | TeacherID (FK) |
|---|---|---|
| S1 | Maths | T1 |
| S2 | English | T2 |
Teachers
| TeacherID (PK) | TeacherName | TeacherEmail |
|---|---|---|
| T1 | Mr Smith | smith@school.com |
| T2 | Ms Jones | jones@school.com |
Now each teacher's email is stored exactly once. Change it in the Teachers table and every subject that references that teacher instantly sees the new value, because they reference the teacher by key, not by copy. The anomalies vanish: you can add a teacher with no subjects, delete a subject without losing the teacher, and update an email in one place.
| Flat file | Relational database | |
|---|---|---|
| Structure | One wide table | Many narrow, focused tables |
| Redundancy | High — facts repeat | Low — each fact stored once |
| Integrity | Hard to enforce | Enforced by keys and constraints |
| Querying across kinds | Manual, error-prone | JOINs (see the SQL lessons) |
| Good for | Tiny, throwaway, single-user data | Anything shared, large or long-lived |
Exam Tip: When you are asked to compare flat files and relational databases, do not just say "relational is better". Name the specific problems — redundancy, and the insertion/deletion/update anomalies — give a concrete example of each from the scenario, and explain that relational design removes them by separating entities and linking with keys.
This terminology is examined precisely, and the synonyms matter because OCR may use either word.
| Term (and synonyms) | Definition |
|---|---|
| Entity | A real-world thing or concept the database stores data about (e.g., Student, Subject, Order). One entity type becomes one table. |
| Attribute | A single property of an entity (e.g., a student's Name or DOB). Becomes a column. |
| Table (relation) | The whole grid of data about one entity type, organised as rows and columns. |
| Record (row, tuple) | One row — all the attribute values for a single instance of the entity (one particular student). |
| Field (column, attribute value) | One cell, or the column it sits in — a single attribute of a single record. |
| Schema | The structure of the database: which tables exist, their fields, the data types, the keys and the relationships. The schema is the design; the records are the data filling it. |
| Entity occurrence / instance | One specific entity, i.e. one record (student "Alice", 001). |
A useful mental model: an entity is the idea "Student"; an attribute is a property such as Name; a record is one actual student; a field is one value of one property of one student; the table is all the students together; the schema is the blueprint that says a Student has these fields of these types.
A key is an attribute, or set of attributes, used either to identify records uniquely or to link tables together. Keys are the mechanism that lets a relational database replace duplicated data with references.
A primary key (PK) is the attribute (or attribute combination) chosen to uniquely identify each record in a table. Every table should have one. A primary key must be:
Good primary keys are often surrogate identifiers minted by the system (StudentID, OrderNumber, ISBN) precisely because they are stable and meaningless — a person's name or email can change, so they make poor keys.
A candidate key is any attribute or set of attributes that could serve as the primary key — i.e. that is unique and minimal. A table can have several candidate keys (a student might be uniquely identified by StudentID or by national insurance number). The designer picks one candidate to be the primary key; the rest are alternate keys.
A foreign key (FK) is an attribute in one table that holds the primary-key value of a record in another table, thereby creating a relationship between the two. It is the relational glue: instead of copying a teacher's whole record into the Subjects table, we store just the teacher's key.
Students
| StudentID (PK) | Name |
|---|---|
| 001 | Alice |
| 002 | Bob |
Enrolments (links students to subjects):
| StudentID (FK) | SubjectID (FK) |
|---|---|
| 001 | S1 |
| 001 | S2 |
| 002 | S1 |
Here StudentID in Enrolments is a foreign key referencing the Students table, and SubjectID is a foreign key referencing the Subjects table. The database can enforce referential integrity: it will refuse an enrolment for student 999 if no such student exists, so foreign keys cannot point at nothing.
A composite key (compound key) is a primary key made of two or more attributes together, where no single one of them is unique on its own. In the Enrolments table, neither StudentID nor SubjectID alone identifies a row (student 001 appears twice; subject S1 appears twice), but the pair (StudentID, SubjectID) is unique — student 001 is enrolled in S1 exactly once. So the composite key is (StudentID, SubjectID).
A secondary key is an attribute (not the primary key) that the database indexes so it can be searched quickly. Name might be a secondary key so that "find all students called Alice" is fast even though Name is not unique. This is the direct bridge to the indexing material later in the topic.
| Key type | Role | Example |
|---|---|---|
| Primary key | Uniquely identifies each record. | StudentID |
| Candidate key | Any attribute(s) that could be the PK (unique, minimal). | StudentID or NI number |
| Alternate key | A candidate key not chosen as PK. | NI number (if StudentID is PK) |
| Foreign key | Holds another table's PK to form a link. | StudentID in Enrolments |
| Composite key | PK made of two or more attributes. | (StudentID, SubjectID) |
| Secondary key | Indexed for fast searching; not the PK. | Name |
Two entities are related when records of one are associated with records of the other. The cardinality of a relationship describes how many of each can be associated.
| Cardinality | Meaning | Example |
|---|---|---|
| One-to-one (1:1) | One record in A relates to at most one in B, and vice versa. | Person — Passport (each person has one passport; each passport belongs to one person). |
| One-to-many (1:M) | One record in A relates to many in B, but each B relates to one A. | Teacher — Students (a teacher has many students; each student has one form teacher). |
| Many-to-many (M:N) | Many records in A relate to many in B. | Students — Subjects (a student takes many subjects; a subject is taken by many students). |
The cardinality of a relationship dictates where the foreign key goes. There are only three cases to learn, and each has a fixed recipe.
1:M — put a foreign key on the "many" side. This is the natural, commonest case. Each record on the many side stores the primary key of its single partner on the one side. Many records can store the same value, which is precisely what makes that side "many":
Teacher
| TeacherID (PK) | Name |
|---|---|
| T1 | Mr Smith |
| T2 | Ms Jones |
Student (TeacherID is a foreign key — note it repeats, and that is allowed):
| StudentID (PK) | Name | TeacherID (FK) |
|---|---|---|
| 001 | Alice | T1 |
| 002 | Bob | T1 |
| 003 | Charlie | T2 |
Two students reference teacher T1, so the student side is the "many" and the teacher side is the "one" — which is exactly why the foreign key lives on Student, never on Teacher. (You could not store the relationship by putting a single StudentID on Teacher, because a teacher has more than one student.)
1:1 — merge, or put a unique foreign key on one side. Because each side relates to at most one of the other, you have a choice. Often the cleanest design is to merge both entities into a single table (a person and their passport details in one Person table). If you keep them separate — perhaps because the passport data is optional or sensitive — put the foreign key on either side and add a uniqueness constraint so the "many" cannot creep back in:
Person / Passport (1:1, FK on Passport, marked UNIQUE so no two passports share a person):
| PassportNo (PK) | PersonID (FK, UNIQUE) | Expiry |
|---|---|---|
| P883421 | 001 | 2031-04-02 |
| P119027 | 002 | 2029-11-18 |
M:N — cannot be stored directly; introduce a third table. You cannot put a single foreign key on either side, because each side relates to many of the other and one column holds only one value. The solution is the junction table below.
A junction table (also called a linking, bridge or associative table) sits between the two entities and holds one row per actual pairing. Its primary key is usually the composite of the two foreign keys. A single M:N relationship becomes two 1:M relationships pointing into the junction table:
erDiagram
STUDENT ||--o{ ENROLMENT : "is enrolled in"
SUBJECT ||--o{ ENROLMENT : "is taken by"
STUDENT {
string StudentID PK
string Name
date DOB
}
ENROLMENT {
string StudentID PK_FK
string SubjectID PK_FK
date DateEnrolled
}
SUBJECT {
string SubjectID PK
string SubjectName
string TeacherID FK
}
The ENROLMENT table contains foreign keys from both Students and Subjects; their composite is its primary key. As a bonus, the junction table is the natural home for any attribute that belongs to the pairing rather than to either entity alone — for example DateEnrolled, which is a property of "this student in this subject", not of the student or the subject by itself.
Exam Tip: A many-to-many relationship in a scenario is a red flag that almost always wants a junction table. Spot it, name it, say its primary key is the composite of the two foreign keys, and note that it resolves the M:N into two 1:M relationships. This is a reliable source of marks in database-design questions.
Foreign keys do more than describe relationships on paper — the database engine actively enforces them. Referential integrity is the rule that every foreign-key value must match an existing primary-key value in the referenced table (or be null where that is permitted). In other words, a relationship may never point at a record that does not exist.
Consider the Enrolments junction table referencing Students:
| StudentID (FK) | SubjectID (FK) |
|---|---|
| 001 | S1 |
| 002 | S1 |
If a program tries to insert the row (999, S1) but there is no student 001…999, the engine rejects the insertion, because student 999 does not exist. Equally, if someone tries to delete student 001 while enrolments for 001 remain, the engine must do one of three things, configured per relationship:
| Policy | Behaviour on deleting the parent |
|---|---|
| RESTRICT / NO ACTION | Refuse the deletion while any child row references it (the safe default). |
| CASCADE | Automatically delete the child rows too (delete the student → delete their enrolments). |
| SET NULL | Leave the child rows but blank their foreign key (only valid where the FK may be null). |
These choices are examinable as part of "how does the database keep data consistent". The headline point: referential integrity guarantees that the links in a relational database never dangle, which is one of the concrete ways a relational store delivers the integrity a flat file cannot. It is also a direct down-payment on the consistency guarantee you will meet again in the transaction-processing lesson.
Designing a schema from a word description is the highest-value skill this lesson teaches, so here is the full method on a small scenario.
A library lends books to members. A book has a title and author; a member has a name and join date. A member may borrow many books over time, and a popular book is borrowed by many members.
Step 1 — find the entities (the nouns that have data of their own). Book and Member clearly qualify. Loan — the act of one member borrowing one book on a date — is also an entity, even though the word does not appear, because it has its own data (the date) and links the other two.
Step 2 — list each entity's attributes and choose a primary key. Book: BookID (PK), Title, Author. Member: MemberID (PK), Name, JoinDate. We mint surrogate ...ID keys because titles and names are neither unique nor stable.
Step 3 — establish the relationships and their cardinality. A member borrows many books and a book is borrowed by many members: Book ↔ Member is many-to-many.
Step 4 — resolve any M:N. The M:N is resolved by the Loan entity, whose foreign keys to Book and Member (plus the date) form its composite key — turning the M:N into two 1:M relationships.
erDiagram
MEMBER ||--o{ LOAN : "borrows"
BOOK ||--o{ LOAN : "is borrowed in"
MEMBER {
string MemberID PK
string Name
date JoinDate
}
BOOK {
string BookID PK
string Title
string Author
}
LOAN {
string MemberID PK_FK
string BookID PK_FK
date LoanDate PK
}
Follow these four steps on any design question — nouns→entities, attributes→PK, relationships→cardinality, resolve M:N — and you will produce a defensible schema every time.
OCR uses crow's-foot notation. The symbols at the end of a line touching an entity tell you how many of that entity participate:
| Symbol at entity end | Reads as |
|---|---|
| ` | |
| ` | o` (bar + circle) |
}o (crow's foot + circle) | zero or many |
| `} | ` (crow's foot + bar) |
So STUDENT ||--o{ ENROLMENT reads: each enrolment belongs to exactly one student (the || end), and each student has zero or many enrolments (the o{ end) — a clean 1:M.
When you draw one by hand: box each entity, list its attributes with the PK underlined and FKs marked, draw a line for each relationship, and put the correct cardinality symbol at each end. The commonest lost mark is labelling only one end of the line.
Each field has a data type fixing what it may store; choosing well is part of integrity (a type stops a date going into an age column).
| Data type | Stores | Example |
|---|---|---|
| INTEGER | Whole numbers | Quantity: 12 |
| REAL / FLOAT | Decimal numbers | Price: 29.99 |
| VARCHAR(n) | Variable-length text up to n characters | Name: "Alice" |
| CHAR(n) | Fixed-length text of exactly n characters | PostCode: "SW1A 1AA" |
| BOOLEAN | True / False | IsActive: True |
| DATE / DATETIME | A calendar date (and time) | DOB: 2007-03-15 |
| TEXT | Long, unbounded text | Description |
A subtle but examinable point: identifiers such as StudentID or a phone number are often best stored as text, not integers — they may have leading zeros (001) or never take part in arithmetic, and storing "007" as the number 7 silently loses information.
StudentID "001" stored as a number becomes 1, losing the leading zeros and implying arithmetic that is never done.A driving school wants a database. It records instructors, learners and lessons. Each lesson is taught by one instructor to one learner at a given date and time. An instructor teaches many learners; a learner may take lessons from more than one instructor over time. (Total: 9 marks)
(a) Define the terms entity, attribute and primary key. [3]
(b) State the relationship (cardinality) between instructors and learners, and explain how it must be implemented in the relational database. [3]
(c) Draw an entity-relationship diagram for the three entities, marking primary keys, foreign keys and the cardinality of each relationship. [3]
| Mark | AO | Awarded for |
|---|---|---|
| 1 | AO1 | (a) Correct definition of entity |
| 2 | AO1 | (a) Correct definition of attribute |
| 3 | AO1 | (a) Correct definition of primary key (unique identifier) |
| 4 | AO2 | (b) Identifying the instructor–learner relationship as many-to-many |
| 5 | AO2 | (b) Stating a linking/junction table is required |
| 6 | AO2 | (b) Recognising LESSON resolves it into two 1:M relationships |
| 7 | AO3 | (c) Three correct entities with sensible attributes and PKs marked |
| 8 | AO3 | (c) Foreign keys in the linking entity shown correctly |
| 9 | AO3 | (c) Correct cardinality symbols at both ends of each relationship |
AO split: AO1 = 3, AO2 = 3, AO3 = 3.
(a) An entity is something we store data about. An attribute is a piece of data about it. A primary key is a field that is unique for each record.
(b) The relationship between instructors and learners is many-to-many, because one instructor teaches many learners and one learner can have several instructors. This is done with a linking table.
(c)
erDiagram
INSTRUCTOR ||--o{ LESSON : teaches
LEARNER ||--o{ LESSON : has
INSTRUCTOR {
string InstructorID PK
string Name
}
LEARNER {
string LearnerID PK
string Name
}
LESSON {
string InstructorID FK
string LearnerID FK
date LessonDate
}
Examiner-style commentary: The calculable, recall-based parts are mostly secure. In (a) the definitions of entity and primary key are sound (marks 1 and 3); the attribute definition is a little loose ("a piece of data about it") but would usually be accepted (mark 2). In (b) the cardinality is correctly identified as many-to-many (mark 4) and a linking table is mentioned (mark 5), but the answer never says that LESSON is that linking table or that it resolves the M:N into two 1:M relationships, so mark 6 is not awarded. The diagram in (c) earns marks 7–9: three entities with PKs, foreign keys in LESSON, and correct crow's-foot cardinality at both ends. Around 8/9.
(a) An entity is a distinct real-world thing or concept that the database stores data about and that becomes a table — here, Instructor, Learner and Lesson. An attribute is a single named property of an entity, which becomes a column — for example a learner's Name or DateOfBirth. A primary key is the attribute (or minimal combination of attributes) chosen to identify each record uniquely; it must be unique across the table and never null, e.g. LearnerID.
(b) The instructor–learner relationship is many-to-many (M:N): each instructor teaches many learners and each learner may, over time, be taught by several instructors. A M:N relationship cannot be stored directly in a relational database, because a single foreign key on either side could only record one partner. It must be resolved with a junction (linking) table — here the LESSON entity. LESSON holds a foreign key to INSTRUCTOR and a foreign key to LEARNER; together with the date/time these form its composite primary key. This turns the one M:N relationship into two 1:M relationships (one instructor → many lessons; one learner → many lessons), each implementable with an ordinary foreign key, and it also gives a natural home to attributes of the pairing such as the lesson's date and time.
(c)
erDiagram
INSTRUCTOR ||--o{ LESSON : "teaches"
LEARNER ||--o{ LESSON : "is taught in"
INSTRUCTOR {
string InstructorID PK
string Name
string Phone
}
LEARNER {
string LearnerID PK
string Name
date DateOfBirth
}
LESSON {
string InstructorID PK_FK
string LearnerID PK_FK
datetime LessonDateTime PK
}
Each LESSON belongs to exactly one instructor and exactly one learner (the || ends), while each instructor and each learner has zero or many lessons (the o{ ends).
Examiner-style commentary: Full 9/9. The discriminators over the mid-band script are: in (a) each definition is precise and exemplified (the primary-key answer states both the uniqueness and not-null rules); in (b) the answer explicitly identifies LESSON as the junction table, explains why a M:N cannot be stored directly, and names the resolution into two 1:M relationships — securing mark 6, the one the mid-band script lost — and even notes that pairing-level attributes live in the junction table; and in (c) every relationship is marked with correct crow's-foot cardinality at both ends, with the composite primary key of the junction entity shown.
ID (a surrogate key) even when a natural key (like ISBN) exists — stability, privacy and the cost of cascading changes.ON DELETE CASCADE versus ON DELETE RESTRICT — what should happen to a learner's lessons when the learner is deleted?Exam Tip: The dependable marks in this topic are crisp definitions (entity/attribute/record/field/key), spotting a M:N and resolving it with a junction table, and a fully labelled ER diagram with cardinality at both ends and keys marked. Drill those three and most database-concepts questions fall out quickly.