You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
SQL — Structured Query Language — is how you actually talk to the relational databases the previous lessons designed. It splits into two jobs: DDL (Data Definition Language) builds the structure — CREATE TABLE and its constraints — and DML (Data Manipulation Language) reads and changes the data — SELECT, INSERT, UPDATE, DELETE. This lesson works through both against a single small school database, showing every query in a real language-tagged sql code fence and, where it helps, the result set it returns as a table so you can see exactly what the query produced. By the end you should be able to write SQL from a worded scenario, not just recognise it.
This lesson addresses the H446 1.3.2 Databases content on SQL:
CREATE TABLE, choosing data types and stating PRIMARY KEY (and where relevant FOREIGN KEY) constraints.SELECT … FROM … WHERE, with comparison/logical operators, pattern matching via LIKE, and ordering via ORDER BY.INSERT, UPDATE and DELETE, understanding the consequences of omitting a WHERE clause.COUNT, SUM, AVG (and MAX/MIN) and partition rows for per-group aggregation with GROUP BY (filtered by HAVING).(Phrasing here paraphrases the specification content; it is not a verbatim quote.)
SQL is a declarative language: you state what result you want, not how the machine should compute it. SELECT Name FROM Students WHERE Grade = 'A' describes the answer ("the names of grade-A students"); the database's query optimiser decides whether to scan the whole table, use an index, and in what order — the how. This is a sharp contrast with the imperative pseudocode of the programming topic, where you spell out every loop and comparison yourself. The synoptic pay-off is that the same declarative query can run fast or slow depending on indexing, which is why keys and indexes (the data-structures link) matter so much.
Every query below runs against these two tables, so you can check each result by hand.
Students
| StudentID | Name | Year | Grade | Score |
|---|---|---|---|---|
| 1 | Alice | 12 | A | 88 |
| 2 | Bob | 12 | B | 72 |
| 3 | Charlie | 13 | A | 91 |
| 4 | Diana | 13 | C | 58 |
| 5 | Eve | 12 | A | 95 |
Results (one row per subject a student takes):
| StudentID | Subject | Mark |
|---|---|---|
| 1 | Maths | 88 |
| 1 | Physics | 76 |
| 2 | Maths | 72 |
| 3 | Maths | 91 |
| 3 | Physics | 84 |
| 4 | Maths | 58 |
Before any data exists, DDL defines the structure. CREATE TABLE names the table, lists each column with its data type, and declares constraints that protect integrity.
CREATE TABLE Students (
StudentID INTEGER NOT NULL,
Name VARCHAR(40) NOT NULL,
Year INTEGER,
Grade CHAR(2),
Score INTEGER,
PRIMARY KEY (StudentID)
);
The constraints carry real meaning, and each is examinable:
| Constraint | Effect |
|---|---|
PRIMARY KEY (StudentID) | StudentID uniquely identifies a row; it cannot be null or duplicated. |
NOT NULL | The column must always hold a value (no blanks). |
FOREIGN KEY (…) REFERENCES … | The column must match a primary-key value in another table (referential integrity). |
UNIQUE | No two rows may share this value (e.g. an email column). |
The Results table carries a foreign key back to Students, written in the table definition:
CREATE TABLE Results (
StudentID INTEGER NOT NULL,
Subject VARCHAR(30) NOT NULL,
Mark INTEGER,
PRIMARY KEY (StudentID, Subject),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID)
);
Note the composite primary key (StudentID, Subject) — exactly the design that came out of normalisation — and the FOREIGN KEY clause, which makes the database refuse a result for a student who does not exist. DDL is where the integrity rules you reasoned about on paper become rules the engine enforces.
Picking the type for each column is part of designing for integrity, and OCR can ask you to justify a choice. A type both limits what can be stored and documents the intent of the column:
StudentID is an INTEGER here because the school uses plain numeric ids and never does arithmetic that would lose information — but an id with leading zeros (007) or letters (AB12) should be VARCHAR instead, because storing it as a number would silently drop the zeros.Name is VARCHAR(40) (variable-length text up to 40 characters) rather than CHAR(40) (fixed length) because names differ in length, and VARCHAR does not pad short values with wasted spaces.Grade is CHAR(2) because grades are short and (almost) fixed in width ("A", "A*"), so a fixed-length type is a reasonable, slightly faster choice.Score/Mark are INTEGER because they are whole-number percentages; a column holding money or a measurement would be REAL/DECIMAL instead.A good answer to "why is this column that type?" always ties the type to the nature of the data: range of values, whether arithmetic is performed, fixed versus variable width, and whether leading zeros or non-digits must survive.
It is worth stressing that the constraints in CREATE TABLE are not mere documentation — the engine enforces every one on every INSERT and UPDATE. If a program tries to insert a second student with StudentID = 1, the PRIMARY KEY constraint rejects it; if it tries to insert a Results row for a non-existent student, the FOREIGN KEY rejects it; if it tries to insert a student with no name, the NOT NULL rejects it. This is the database doing, automatically and for every client, the validation that an application would otherwise have to re-implement (and could forget). It is the most concrete sense in which a relational database protects its own integrity.
The SELECT statement retrieves data. Its skeleton:
SELECT column1, column2 -- which columns
FROM table_name -- from which table
WHERE condition -- keeping which rows
ORDER BY column ASC | DESC; -- in which order
Worked examples against the sample data, with their result sets so you can verify them:
-- Every column, every row
SELECT * FROM Students;
returns all five rows unchanged. Selecting specific columns and filtering:
SELECT Name, Score
FROM Students
WHERE Grade = 'A';
| Name | Score |
|---|---|
| Alice | 88 |
| Charlie | 91 |
| Eve | 95 |
Only the three grade-A students appear, and only the two requested columns. Combining conditions with AND:
SELECT Name
FROM Students
WHERE Year = 12 AND Grade = 'A';
| Name |
|---|
| Alice |
| Eve |
Charlie is excluded — he is grade A but Year 13, so he fails the AND.
Two everyday refinements of SELECT round out the basics. SELECT DISTINCT removes duplicate rows from the result — SELECT DISTINCT Grade FROM Students returns each grade once (A, B, C) rather than once per student, which is handy when you want the set of values that occur rather than every occurrence. And the AS keyword aliases an output column to a friendlier name, which is essential once you start returning computed values: SELECT Name, Score * 1.0 AS Percentage labels the computed column Percentage instead of an unreadable expression. Aliases are a presentation device — they rename the output, not the underlying table — but they make result sets self-documenting, and examiners expect aggregate results in particular to be aliased.
WHERE keeps only the rows for which its condition is true. The operators you must know:
| Operator | Meaning | Example |
|---|---|---|
= | Equal to | WHERE Year = 12 |
<> or != | Not equal to | WHERE Grade <> 'U' |
> < >= <= | Numeric/text comparison | WHERE Score >= 80 |
BETWEEN … AND … | Inclusive range | WHERE Score BETWEEN 70 AND 90 |
LIKE | Pattern matching | WHERE Name LIKE 'A%' |
IN (…) | Matches any in a list | WHERE Grade IN ('A', 'B') |
IS NULL / IS NOT NULL | Tests for missing data | WHERE Grade IS NULL |
AND OR NOT | Combine / negate conditions | WHERE Year = 12 AND Grade = 'A' |
A subtle but heavily examined point: you test for missing data with IS NULL, never = NULL. NULL means "unknown", and "anything = unknown" is itself unknown (not true), so WHERE Grade = NULL matches no rows even where the grade is genuinely blank.
Three of these operators are worth a worked look, because they are concise ways of writing conditions that would otherwise be clumsy. BETWEEN tests an inclusive range — both endpoints count:
SELECT Name, Score FROM Students WHERE Score BETWEEN 70 AND 90;
| Name | Score |
|---|---|
| Alice | 88 |
| Bob | 72 |
Eve (95) and Charlie (91) are above the range; Diana (58) is below. Note that BETWEEN 70 AND 90 is shorthand for Score >= 70 AND Score <= 90 — the endpoints 70 and 90 are included, which is the commonest point of confusion.
IN tests membership of a list, and is a tidy alternative to a chain of ORs:
SELECT Name, Grade FROM Students WHERE Grade IN ('A', 'C');
| Name | Grade |
|---|---|
| Alice | A |
| Charlie | A |
| Diana | C |
| Eve | A |
This returns the same rows as WHERE Grade = 'A' OR Grade = 'C' but is shorter and clearer, especially for long lists. Finally NOT negates a condition, so WHERE NOT Grade = 'A' (equivalently WHERE Grade <> 'A') returns just Bob and Diana — the two non-A students. Combining operators with AND/OR follows ordinary precedence (AND binds tighter than OR); when in doubt, parenthesise to make the intent explicit, e.g. WHERE (Grade = 'A' OR Grade = 'B') AND Year = 12.
LIKE uses two wildcards:
| Wildcard | Matches |
|---|---|
% | Any sequence of characters, including none |
_ | Exactly one character |
SELECT Name FROM Students WHERE Name LIKE 'A%'; -- starts with A -> Alice
SELECT Name FROM Students WHERE Name LIKE '%e'; -- ends with e -> Charlie, Eve, (Alice? no -> Alice ends 'e'? "Alice" ends 'e') Alice, Charlie, Eve
SELECT Name FROM Students WHERE Name LIKE '_o%'; -- 2nd char is o -> Bob
(The middle query matches Alice, Charlie and Eve — every name ending in "e".)
ORDER BY sorts the result set — ascending (ASC, the default) or descending (DESC). It is the last thing to run, so it sees the final columns.
SELECT Name, Score
FROM Students
ORDER BY Score DESC;
| Name | Score |
|---|---|
| Eve | 95 |
| Charlie | 91 |
| Alice | 88 |
| Bob | 72 |
| Diana | 58 |
You can sort by several columns; earlier columns are the major sort key:
SELECT Name, Grade, Score
FROM Students
ORDER BY Grade ASC, Score DESC; -- by grade A->C, ties broken by score high->low
An aggregate function collapses many rows into a single value. The five examined:
| Function | Returns |
|---|---|
COUNT(*) / COUNT(col) | Number of rows (COUNT(col) ignores nulls) |
SUM(col) | Total of a numeric column |
AVG(col) | Mean of a numeric column |
MAX(col) / MIN(col) | Largest / smallest value |
SELECT COUNT(*) AS NumStudents,
AVG(Score) AS MeanScore,
MAX(Score) AS TopScore
FROM Students;
| NumStudents | MeanScore | TopScore |
|---|---|---|
| 5 | 80.8 | 95 |
The whole table becomes one row: five students, mean score (88+72+91+58+95)/5=80.8, top 95. Using AS to alias each result gives the output column a readable name.
Two refinements of counting are worth knowing. First, COUNT(*) counts rows, whereas COUNT(column) counts only the rows where that column is not null — so if some students had no recorded grade, COUNT(Grade) would be smaller than COUNT(*). This is the practical reason the distinction is examined: counting "students" (rows) is not the same as counting "recorded grades" (non-null values). Second, COUNT(DISTINCT column) counts the number of different values, ignoring duplicates — SELECT COUNT(DISTINCT Subject) FROM Results returns 2 (Maths and Physics) even though there are six result rows. A related subtlety with AVG: it averages over the non-null values only, so a null mark is excluded from the mean rather than treated as zero — which is usually what you want, and another reason never to store a missing value as 0.
Aggregates over the whole table are useful, but more often you want them per group — average mark per subject, count per year. GROUP BY partitions the rows by a column's value and applies the aggregate to each partition separately.
SELECT Subject, COUNT(*) AS Entries, AVG(Mark) AS AvgMark
FROM Results
GROUP BY Subject;
| Subject | Entries | AvgMark |
|---|---|---|
| Maths | 4 | 77.25 |
| Physics | 2 | 80.0 |
The four Maths rows collapse into one group (mean (88+72+91+58)/4=77.25) and the two Physics rows into another. Every column in the SELECT must be either grouped or aggregated — you cannot also select StudentID here, because a group spans several students.
HAVING then filters the groups (after aggregation), exactly as WHERE filters rows (before):
SELECT Subject, AVG(Mark) AS AvgMark
FROM Results
GROUP BY Subject
HAVING AVG(Mark) > 78;
| Subject | AvgMark |
|---|---|
| Physics | 80.0 |
Maths is dropped because its group average (77.25) fails the HAVING test.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.