You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
A database is only as valuable as it is trustworthy. The data it holds is frequently sensitive and valuable — personal details, financial records, medical histories — and two distinct things can go wrong with it. It can become incorrect (a foreign key pointing nowhere, an impossible date of birth, a half-finished update), or it can fall into the wrong hands or be lost entirely (an attacker reads the user table, a disk fails, a careless query wipes a table). The first family of concerns is data integrity: keeping data accurate, consistent and valid. The second is security: controlling who may do what, defending against attacks such as SQL injection, and ensuring the data can be recovered after a disaster. This lesson treats both, with a fully worked SQL-injection attack and its parameterised-query defence as the centrepiece, because injection is the attack that most directly exploits how a database executes SQL. The running schema remains Student / Course / Enrolment, with a Users login table added for the injection example.
This lesson addresses the AQA A-Level Computer Science (7517) Fundamentals of databases content on protecting data: the integrity rules (entity integrity, referential integrity and domain/validation constraints); access control through user privileges (GRANT/REVOKE) and views; defending against SQL injection using parameterised queries and input validation; and backup and recovery using the transaction log. It draws together the keys and integrity rules of the relational-model lesson, the CREATE TABLE constraints, the transaction log from the transactions lesson, and the input-validation ideas from the programming-fundamentals area.
Data integrity means the stored data is accurate, internally consistent and reliable. The relational model and DBMS enforce it through three complementary rules.
Every table must have a primary key, and no part of a primary-key value may be NULL. This guarantees that every row is uniquely identifiable — without it, you could not reliably refer to, update or delete a specific record. It is enforced by the PRIMARY KEY constraint.
Every foreign-key value must either match an existing primary-key value in the referenced table, or be NULL where the column permits it. This keeps relationships between tables valid and prevents "orphan" rows.
-- Student 9999 does not exist
INSERT INTO Enrolment (StudentID, CourseID) VALUES (9999, 'CS101');
Effect: rejected. The foreign key Enrolment.StudentID has no matching Student row, so referential integrity blocks the insert. It likewise blocks deleting a Student who still has Enrolment rows (unless a cascade rule is defined).
Every value must conform to its column's data type, format and constraints — the column's domain.
INSERT INTO Enrolment (StudentID, CourseID, Score) VALUES (1001, 'CS101', 150);
-- with Score defined as INTEGER CHECK (Score >= 0 AND Score <= 100)
Effect: rejected — 150 fails the CHECK. Domain integrity is the database-layer form of validation: range checks, type checks, presence checks (NOT NULL), format checks and lookup/referential checks. It is worth distinguishing two related ideas examiners often pair:
| Concept | Meaning | Example |
|---|---|---|
| Validation | Is the data reasonable and within allowed bounds? | Range check (0–100); type check (integer); presence check (NOT NULL) |
| Verification | Has the data been entered correctly, matching the source? | Double entry; visual check against a paper form |
Validation cannot guarantee correctness — "age 35" passes a range check even if the person is 53 — only plausibility; verification addresses transcription accuracy.
Security starts with controlling who can do what. Each database user account is granted a set of privileges. A coarse view:
SELECT but not change data.SELECT, INSERT, UPDATE, DELETE.The guiding rule is the principle of least privilege: each account gets the minimum rights it needs and no more, so a compromised account can do limited damage. Privileges are managed with the Data Control Language statements GRANT and REVOKE:
-- Let teacher1 read the Student table only
GRANT SELECT ON Student TO teacher1;
-- Give admin1 full rights on Enrolment
GRANT ALL PRIVILEGES ON Enrolment TO admin1;
-- Take away student1's ability to insert into Student
REVOKE INSERT ON Student FROM student1;
Effect: teacher1 can now query Student but any UPDATE/DELETE they attempt is refused; admin1 can do anything to Enrolment; student1 can no longer add rows to Student.
A view is a virtual table defined by a stored query. Granting access to a view rather than the base table lets you expose only some rows or columns:
-- Expose names and form groups, but NOT email or date of birth
CREATE VIEW PublicStudentInfo AS
SELECT StudentID, FirstName, Surname, FormGroup
FROM Student;
GRANT SELECT ON PublicStudentInfo TO teacher1;
Effect: teacher1 querying PublicStudentInfo sees four columns; Email and DateOfBirth are simply unreachable through the view, even though they exist in Student. Views thus implement column-level (and, with a WHERE clause, row-level) access control.
A view can also restrict rows. To let a sixth-form tutor see only the students in their own form group, define the view with a WHERE clause and grant access to that:
CREATE VIEW Form10A AS
SELECT StudentID, FirstName, Surname
FROM Student
WHERE FormGroup = '10A';
Effect: querying Form10A returns only the 10A students — rows for other form groups are invisible through the view. Together, column-restricting and row-restricting views let an administrator expose exactly the slice of data each role legitimately needs, which is the practical implementation of least privilege at the data level rather than just the table level.
Granting privileges to each individual user quickly becomes unmanageable in a real organisation with hundreds of staff. The standard solution is role-based access control: privileges are granted to a named role (e.g. teacher, receptionist, dba), and each user is then assigned a role. When a new teacher joins, the administrator simply gives them the teacher role rather than re-granting a dozen individual privileges; when the school's policy on who may see grades changes, the privilege is altered once on the role rather than on every teacher. This indirection — users → roles → privileges — is the same principle as defining a function once and calling it many times, applied to security administration, and it makes the least-privilege principle practical at scale.
SQL injection is an attack in which an application builds a SQL statement by concatenating untrusted user input directly into the query string, allowing the attacker to inject their own SQL that the database then executes. It is the textbook example of why input must never be trusted.
Imagine a login backed by a Users(Username, PasswordHash) table, where the application builds the query by string concatenation:
query = "SELECT * FROM Users WHERE Username = '" + input_user
+ "' AND Password = '" + input_pass + "';"
With honest input Username = alice, Password = secret, the query is harmless:
SELECT * FROM Users WHERE Username = 'alice' AND Password = 'secret';
The attacker types, in the username box:
Username: ' OR 1=1 --
Password: (anything)
Because the input is pasted into the SQL text, the assembled statement becomes:
SELECT * FROM Users WHERE Username = '' OR 1=1 --' AND Password = 'anything';
Trace what the database sees:
' closes the Username string early.OR 1=1 is a condition that is always true, so the WHERE matches every row.-- begins a SQL comment, so the rest of the line ( AND Password = ...) is ignored.Effect: the query returns all users and the password check is commented out — authentication is bypassed entirely, typically logging the attacker in as the first user (often an administrator). A more destructive payload such as '; DROP TABLE Users; -- could attempt to delete the table outright. The root cause is always the same: user data was treated as executable SQL.
The defence is to never concatenate input into SQL. Instead, the SQL is sent to the DBMS with placeholders, and the user values are sent separately as parameters. The DBMS then treats each parameter strictly as a data value, never as SQL syntax:
-- The statement is prepared with placeholders (? marks bind variables)
SELECT * FROM Users WHERE Username = ? AND Password = ?;
-- The values are bound separately: param1 = "' OR 1=1 --", param2 = "anything"
Effect: the malicious string ' OR 1=1 -- is now compared as a literal username. The database looks for a user literally called ' OR 1=1 --, finds none, and returns no rows — the injection is inert because the quotes and -- are data, not code. The structure of the query was fixed before any user value was attached, so the attacker can no longer alter it.
Parameterised queries are the primary and decisive defence, but layer others behind them:
| Method | How it helps |
|---|---|
| Parameterised queries / prepared statements | Input is bound as data; it can never change the SQL structure (primary defence) |
| Input validation | Reject implausible input (e.g. usernames may be alphanumeric only) as an extra filter |
| Least privilege | The app's DB account cannot DROP tables, so even a successful injection is limited |
| Stored procedures | Fixed, pre-defined routines reduce the surface for ad-hoc injected SQL |
Note that validation alone is not a reliable defence — attackers find encodings that slip past filters — which is exactly why parameterisation, removing the concatenation entirely, is the real fix.
The encryption-vs-hashing distinction is frequently examined: encryption is reversible with the key (you decrypt to recover the data); hashing is one-way (you cannot un-hash). That is precisely why passwords are hashed — the system never needs the original back; it just re-hashes the entered password and compares.
Walk through a login to see why this is secure. When a user registers with password p, the system stores hash(salt + p) together with the random salt — never p itself. At login, the user submits a password q; the system fetches their stored salt, computes hash(salt + q), and checks whether it equals the stored hash. If q = p the hashes match and login succeeds; the plaintext password was never stored and is never reconstructed. The salt ensures that two users who happen to choose the same password get different stored hashes, so an attacker who steals the table cannot tell which accounts share a password and cannot use a precomputed "rainbow table" of common-password hashes — they would have to brute-force each salted hash individually. This is the exact reasoning an exam answer should give for "explain why passwords are stored as salted hashes rather than encrypted": one-way hashing means a stolen table does not reveal passwords, and salting defeats precomputation.
Even a perfectly secured database can be lost to hardware failure, a fire, or a WHERE-less DELETE. Backup strategies provide copies to restore from:
| Strategy | Copies | Restore | Trade-off |
|---|---|---|---|
| Full | The entire database | Single step | Slow to take, most storage |
| Incremental | Only data changed since the last backup of any kind | Full backup + every increment in order | Fast/small to take; slower, more fragile restore |
| Differential | Data changed since the last full backup | Full backup + latest differential | Restore simpler than incremental; backups grow over time |
The transaction log records every change made to the database. Combined with a backup it supports two complementary recovery actions:
Replication (keeping live copies of the database on additional servers) gives both fault tolerance and improved read performance, and is distinct from backup: a replica is current, whereas a backup is a point-in-time snapshot.
Suppose a shop takes a full backup every night at 02:00 and keeps the transaction log continuously. At 14:30 the main disk fails. Recovery proceeds in two stages:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.