Database Security and Integrity

A database is only as valuable as it is trustworthy. The data it holds is frequently sensitive and valuable — personal details, financial records, medical histories — and two distinct things can go wrong with it. It can become incorrect (a foreign key pointing nowhere, an impossible date of birth, a half-finished update), or it can fall into the wrong hands or be lost entirely (an attacker reads the user table, a disk fails, a careless query wipes a table). The first family of concerns is data integrity: keeping data accurate, consistent and valid. The second is security: controlling who may do what, defending against attacks such as SQL injection, and ensuring the data can be recovered after a disaster. This lesson treats both, with a fully worked SQL-injection attack and its parameterised-query defence as the centrepiece, because injection is the attack that most directly exploits how a database executes SQL. The running schema remains Student / Course / Enrolment, with a Users login table added for the injection example.

Spec Mapping

This lesson addresses the AQA A-Level Computer Science (7517) Fundamentals of databases content on protecting data: the integrity rules (entity integrity, referential integrity and domain/validation constraints); access control through user privileges (GRANT/REVOKE) and views; defending against SQL injection using parameterised queries and input validation; and backup and recovery using the transaction log. It draws together the keys and integrity rules of the relational-model lesson, the CREATE TABLE constraints, the transaction log from the transactions lesson, and the input-validation ideas from the programming-fundamentals area.

Data Integrity

Data integrity means the stored data is accurate, internally consistent and reliable. The relational model and DBMS enforce it through three complementary rules.

Entity integrity

Every table must have a primary key, and no part of a primary-key value may be NULL. This guarantees that every row is uniquely identifiable — without it, you could not reliably refer to, update or delete a specific record. It is enforced by the PRIMARY KEY constraint.

Referential integrity

Every foreign-key value must either match an existing primary-key value in the referenced table, or be NULL where the column permits it. This keeps relationships between tables valid and prevents "orphan" rows.

-- Student 9999 does not exist
INSERT INTO Enrolment (StudentID, CourseID) VALUES (9999, 'CS101');

Effect: rejected. The foreign key Enrolment.StudentID has no matching Student row, so referential integrity blocks the insert. It likewise blocks deleting a Student who still has Enrolment rows (unless a cascade rule is defined).

Domain integrity (and validation)

Every value must conform to its column's data type, format and constraints — the column's domain.

INSERT INTO Enrolment (StudentID, CourseID, Score) VALUES (1001, 'CS101', 150);
-- with Score defined as INTEGER CHECK (Score >= 0 AND Score <= 100)

Effect: rejected — 150 fails the CHECK. Domain integrity is the database-layer form of validation: range checks, type checks, presence checks (NOT NULL), format checks and lookup/referential checks. It is worth distinguishing two related ideas examiners often pair:

Concept	Meaning	Example
Validation	Is the data reasonable and within allowed bounds?	Range check (0–100); type check (integer); presence check (NOT NULL)
Verification	Has the data been entered correctly, matching the source?	Double entry; visual check against a paper form

Validation cannot guarantee correctness — "age 35" passes a range check even if the person is 53 — only plausibility; verification addresses transcription accuracy.

Access Control — Privileges and Least Privilege

Security starts with controlling who can do what. Each database user account is granted a set of privileges. A coarse view:

Read-only: may SELECT but not change data.
Read–write: may SELECT, INSERT, UPDATE, DELETE.
Administrative: may also create/drop tables and manage other users.

The guiding rule is the principle of least privilege: each account gets the minimum rights it needs and no more, so a compromised account can do limited damage. Privileges are managed with the Data Control Language statements GRANT and REVOKE:

-- Let teacher1 read the Student table only
GRANT SELECT ON Student TO teacher1;

-- Give admin1 full rights on Enrolment
GRANT ALL PRIVILEGES ON Enrolment TO admin1;

-- Take away student1's ability to insert into Student
REVOKE INSERT ON Student FROM student1;

Effect: teacher1 can now query Student but any UPDATE/DELETE they attempt is refused; admin1 can do anything to Enrolment; student1 can no longer add rows to Student.

Views as a security layer

A view is a virtual table defined by a stored query. Granting access to a view rather than the base table lets you expose only some rows or columns:

-- Expose names and form groups, but NOT email or date of birth
CREATE VIEW PublicStudentInfo AS
SELECT StudentID, FirstName, Surname, FormGroup
FROM Student;

GRANT SELECT ON PublicStudentInfo TO teacher1;

Effect: teacher1 querying PublicStudentInfo sees four columns; Email and DateOfBirth are simply unreachable through the view, even though they exist in Student. Views thus implement column-level (and, with a WHERE clause, row-level) access control.

A view can also restrict rows. To let a sixth-form tutor see only the students in their own form group, define the view with a WHERE clause and grant access to that:

CREATE VIEW Form10A AS
SELECT StudentID, FirstName, Surname
FROM Student
WHERE FormGroup = '10A';

Effect: querying Form10A returns only the 10A students — rows for other form groups are invisible through the view. Together, column-restricting and row-restricting views let an administrator expose exactly the slice of data each role legitimately needs, which is the practical implementation of least privilege at the data level rather than just the table level.

Scaling access control with roles

Granting privileges to each individual user quickly becomes unmanageable in a real organisation with hundreds of staff. The standard solution is role-based access control: privileges are granted to a named role (e.g. teacher, receptionist, dba), and each user is then assigned a role. When a new teacher joins, the administrator simply gives them the teacher role rather than re-granting a dozen individual privileges; when the school's policy on who may see grades changes, the privilege is altered once on the role rather than on every teacher. This indirection — users → roles → privileges — is the same principle as defining a function once and calling it many times, applied to security administration, and it makes the least-privilege principle practical at scale.

SQL Injection — A Worked Attack and Its Fix

SQL injection is an attack in which an application builds a SQL statement by concatenating untrusted user input directly into the query string, allowing the attacker to inject their own SQL that the database then executes. It is the textbook example of why input must never be trusted.

The vulnerable code

Imagine a login backed by a Users(Username, PasswordHash) table, where the application builds the query by string concatenation:

query = "SELECT * FROM Users WHERE Username = '" + input_user
      + "' AND Password = '" + input_pass + "';"

With honest input Username = alice, Password = secret, the query is harmless:

SELECT * FROM Users WHERE Username = 'alice' AND Password = 'secret';

The attack

The attacker types, in the username box:

Username:  ' OR 1=1 --
Password:  (anything)

Because the input is pasted into the SQL text, the assembled statement becomes:

SELECT * FROM Users WHERE Username = '' OR 1=1 --' AND Password = 'anything';

Trace what the database sees:

The attacker's leading ' closes the Username string early.
OR 1=1 is a condition that is always true, so the WHERE matches every row.
-- begins a SQL comment, so the rest of the line ( AND Password = ...) is ignored.

Effect: the query returns all users and the password check is commented out — authentication is bypassed entirely, typically logging the attacker in as the first user (often an administrator). A more destructive payload such as '; DROP TABLE Users; -- could attempt to delete the table outright. The root cause is always the same: user data was treated as executable SQL.

The fix — parameterised queries (prepared statements)

The defence is to never concatenate input into SQL. Instead, the SQL is sent to the DBMS with placeholders, and the user values are sent separately as parameters. The DBMS then treats each parameter strictly as a data value, never as SQL syntax:

-- The statement is prepared with placeholders (? marks bind variables)
SELECT * FROM Users WHERE Username = ? AND Password = ?;
-- The values are bound separately:  param1 = "' OR 1=1 --",  param2 = "anything"

Effect: the malicious string ' OR 1=1 -- is now compared as a literal username. The database looks for a user literally called ' OR 1=1 --, finds none, and returns no rows — the injection is inert because the quotes and -- are data, not code. The structure of the query was fixed before any user value was attached, so the attacker can no longer alter it.

Defence in depth

Parameterised queries are the primary and decisive defence, but layer others behind them:

Method	How it helps
Parameterised queries / prepared statements	Input is bound as data; it can never change the SQL structure (primary defence)
Input validation	Reject implausible input (e.g. usernames may be alphanumeric only) as an extra filter
Least privilege	The app's DB account cannot `DROP` tables, so even a successful injection is limited
Stored procedures	Fixed, pre-defined routines reduce the surface for ad-hoc injected SQL

Note that validation alone is not a reliable defence — attackers find encodings that slip past filters — which is exactly why parameterisation, removing the concatenation entirely, is the real fix.

Protecting Stored Data — Encryption and Hashing

Encryption at rest: stored data is encrypted so that stealing the disk or files yields unreadable ciphertext without the key.
Encryption in transit: data travelling between client and server is encrypted (e.g. with TLS) so it cannot be read if intercepted on the network.
Password storage: passwords must never be stored as plaintext. They are hashed with a one-way function and a per-user salt (random data added before hashing) so that identical passwords hash differently and precomputed "rainbow table" attacks fail.

The encryption-vs-hashing distinction is frequently examined: encryption is reversible with the key (you decrypt to recover the data); hashing is one-way (you cannot un-hash). That is precisely why passwords are hashed — the system never needs the original back; it just re-hashes the entered password and compares.

Walk through a login to see why this is secure. When a user registers with password p, the system stores hash(salt + p) together with the random salt — never p itself. At login, the user submits a password q; the system fetches their stored salt, computes hash(salt + q), and checks whether it equals the stored hash. If q = p the hashes match and login succeeds; the plaintext password was never stored and is never reconstructed. The salt ensures that two users who happen to choose the same password get different stored hashes, so an attacker who steals the table cannot tell which accounts share a password and cannot use a precomputed "rainbow table" of common-password hashes — they would have to brute-force each salted hash individually. This is the exact reasoning an exam answer should give for "explain why passwords are stored as salted hashes rather than encrypted": one-way hashing means a stolen table does not reveal passwords, and salting defeats precomputation.

Backup and Recovery

Even a perfectly secured database can be lost to hardware failure, a fire, or a WHERE-less DELETE. Backup strategies provide copies to restore from:

Strategy	Copies	Restore	Trade-off
Full	The entire database	Single step	Slow to take, most storage
Incremental	Only data changed since the last backup of any kind	Full backup + every increment in order	Fast/small to take; slower, more fragile restore
Differential	Data changed since the last full backup	Full backup + latest differential	Restore simpler than incremental; backups grow over time

The transaction log and recovery

The transaction log records every change made to the database. Combined with a backup it supports two complementary recovery actions:

Roll-forward (redo): start from a backup and replay the log's committed changes up to the moment of failure, recovering work done since the backup — this is how durability is realised after a crash.
Roll-back (undo): discard the effects of transactions that were in progress (uncommitted) at the crash, returning to a consistent state — exactly the atomicity guarantee from the transactions lesson.

Replication (keeping live copies of the database on additional servers) gives both fault tolerance and improved read performance, and is distinct from backup: a replica is current, whereas a backup is a point-in-time snapshot.

A worked recovery scenario

Suppose a shop takes a full backup every night at 02:00 and keeps the transaction log continuously. At 14:30 the main disk fails. Recovery proceeds in two stages:

Restore the most recent full backup (last night's 02:00 snapshot) onto replacement hardware — this recovers the database as it stood at 02:00.
Roll forward by replaying the transaction log's committed entries from 02:00 up to the last entry written before the 14:30 failure — recovering all the day's committed work.

Database Security and Integrity

Database Security and Integrity

Spec Mapping

Data Integrity

Entity integrity

Referential integrity

Domain integrity (and validation)

Access Control — Privileges and Least Privilege

Views as a security layer

Scaling access control with roles

SQL Injection — A Worked Attack and Its Fix

The vulnerable code

The attack

The fix — parameterised queries (prepared statements)

Defence in depth

Protecting Stored Data — Encryption and Hashing

Backup and Recovery

The transaction log and recovery

A worked recovery scenario

More in Computer Science