You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Most of the SELECT queries you have written so far return rows as they are stored — one output row per matching table row. But a huge proportion of real questions are about summaries: "how many students are in each form group?", "what is the average grade per subject?", "which courses have more than 30 enrolments?". These are answered by aggregate functions, which collapse many rows into a single value, and by GROUP BY, which applies those functions to each group of rows separately rather than to the whole table. This lesson builds the complete summarising toolkit and — critically — establishes the difference between WHERE (which filters individual rows before aggregation) and HAVING (which filters whole groups after aggregation), one of the most reliably examined distinctions in the SQL part of the specification. Throughout we reuse the Student / Course / Enrolment schema and show every query with its result table.
This lesson covers the AQA A-Level Computer Science (7517) Fundamentals of databases content on aggregating and summarising data with SQL: the aggregate functions COUNT, SUM, AVG, MIN and MAX; grouping rows with GROUP BY; and filtering groups with HAVING as distinct from filtering rows with WHERE. It builds on the SELECT/WHERE and join lessons (aggregates are routinely applied across joined tables) and supports the exam-practice synthesis lesson, where summarising queries appear frequently.
So that every result is concrete, assume this small Enrolment table joined to Student and Course. The raw Enrolment rows are:
| StudentID | CourseID | Subject | Score |
|---|---|---|---|
| 1001 | CS101 | Computing | 88 |
| 1002 | CS101 | Computing | 74 |
| 1001 | MA210 | Maths | 92 |
| 1003 | MA210 | Maths | 84 |
| 1002 | EN105 | English | 71 |
| 1003 | EN105 | English | 73 |
We will run summarising queries against this data and show the exact output each one produces.
Notice the shift in granularity that aggregation brings. Every SELECT you have written so far returns data at the granularity of the row — one output line per matching enrolment. Aggregation deliberately changes that granularity: a whole-table aggregate returns one line for the entire table, and a GROUP BY returns one line per group. Keeping clear in your mind "what does one output row represent here?" is the single most useful habit when reading or writing these queries — for a plain SELECT it is one enrolment; for GROUP BY Subject it is one subject; for GROUP BY FormGroup, Subject it is one form-group-and-subject combination. Almost every mistake in this topic comes from losing track of that question.
An aggregate function takes a column of many values and returns a single value. The five in the specification are:
| Function | Returns | Typical use |
|---|---|---|
COUNT() | The number of rows | How many enrolments? |
SUM() | The total of numeric values | Total revenue |
AVG() | The arithmetic mean | Average score |
MIN() | The smallest value | Lowest score / earliest date |
MAX() | The largest value | Highest score / latest date |
It is worth being precise about what each of the five functions does and where each is appropriate, because exam questions often hinge on choosing the right one:
COUNT answers "how many?". It is the only aggregate that works equally on text, dates and numbers, because it counts rows (or non-NULL values), not magnitudes. COUNT(*) counts every row; COUNT(column) counts non-NULL values; COUNT(DISTINCT column) counts distinct non-NULL values. Use it for "number of students", "number of orders", "number of different cities".SUM answers "what is the total?" and applies only to numeric columns. A classic use is total revenue: SUM(Quantity * UnitPrice), computing the product per row first, then totalling. Summing a text column is meaningless and is rejected.AVG answers "what is the mean?" — the sum divided by the count of non-NULL values. Because NULLs are excluded from both the sum and the divisor, AVG over a column with missing values is not the same as treating those values as zero.MIN and MAX answer "what is the smallest / largest?". They work on numbers (lowest/highest score), on dates (MIN(DateOfBirth) is the oldest student, MAX(LoanDate) is the most recent loan), and even on text (alphabetical first/last). This date behaviour is a frequent exam point: the earliest date is the minimum, which can feel counter-intuitive.A small worked contrast — the difference between SUM and COUNT on the same column:
SELECT COUNT(Score) AS HowMany, SUM(Score) AS TotalMarks, AVG(Score) AS Mean
FROM Enrolment;
Result:
| HowMany | TotalMarks | Mean |
|---|---|---|
| 6 | 482 | 80.33 |
COUNT(Score) reports there are 6 scores; SUM(Score) adds them to 482; AVG(Score) is 482 ÷ 6 = 80.33. Picking COUNT when the question wants a total — or vice versa — is a common slip that these distinct results guard against.
Without GROUP BY, an aggregate condenses the entire table to one row:
SELECT COUNT(*) AS TotalEnrolments,
AVG(Score) AS MeanScore,
MAX(Score) AS TopScore,
MIN(Score) AS LowestScore
FROM Enrolment;
Result:
| TotalEnrolments | MeanScore | TopScore | LowestScore |
|---|---|---|---|
| 6 | 80.33 | 92 | 71 |
One input table of six rows becomes one summary row. Note you can compute several aggregates in a single SELECT.
This distinction matters because of NULLs:
COUNT(*) counts every row, including rows where columns are NULL.COUNT(column) counts only rows where that column is not NULL.COUNT(DISTINCT column) counts the distinct non-NULL values.Suppose Student has 6 rows but only 4 have an Email:
SELECT COUNT(*) AS AllStudents,
COUNT(Email) AS WithEmail,
COUNT(DISTINCT FormGroup) AS FormGroups
FROM Student;
Result:
| AllStudents | WithEmail | FormGroups |
|---|---|---|
| 6 | 4 | 2 |
The two NULL emails are excluded from COUNT(Email). This is a frequent exam trap: COUNT(*) and COUNT(Email) give different answers precisely when NULLs are present. (SUM, AVG, MIN and MAX also ignore NULLs — an important subtlety for AVG, since a NULL is skipped, not treated as zero.)
GROUP BY partitions the rows into groups that share a value, then applies the aggregate to each group separately, producing one output row per group.
SELECT Subject, AVG(Score) AS AvgScore
FROM Enrolment
GROUP BY Subject;
Result:
| Subject | AvgScore |
|---|---|
| Computing | 81.0 |
| Maths | 88.0 |
| English | 72.0 |
The six raw rows are grouped into three subjects, and AVG(Score) is computed within each: Computing = (88+74)/2 = 81.0, Maths = (92+84)/2 = 88.0, English = (71+73)/2 = 72.0.
SELECT FormGroup, COUNT(*) AS NumberOfStudents
FROM Student
GROUP BY FormGroup;
Result (for our sample):
| FormGroup | NumberOfStudents |
|---|---|
| 10A | 3 |
| 10B | 3 |
Every column in the
SELECTlist must either be inside an aggregate function or be named in theGROUP BYclause.
The reason is logical: each output row represents a whole group, so a bare column that is not grouped has no single value to show. This query is invalid:
-- INVALID: Score is neither aggregated nor grouped
SELECT Subject, Score, AVG(Score)
FROM Enrolment
GROUP BY Subject;
There are two scores in the Computing group (88 and 74) — which would Score display? The DBMS rejects it. Either aggregate Score (e.g. MAX(Score)) or add it to GROUP BY.
You may group by a combination, producing one row per distinct combination:
SELECT s.FormGroup, e.Subject, AVG(e.Score) AS AvgScore
FROM Student s
JOIN Enrolment e ON s.StudentID = e.StudentID
GROUP BY s.FormGroup, e.Subject
ORDER BY s.FormGroup, e.Subject;
This reports the average score for each (form group, subject) pair — a two-dimensional summary, and a natural place where grouping meets the join skills from the previous lesson.
A summary query often needs to show only some of the groups — "subjects whose average is above 75", "courses with more than 30 enrolments". You cannot do this with WHERE, because WHERE is evaluated before the rows are grouped and so cannot see a group's aggregate. That is the job of HAVING:
SELECT Subject, AVG(Score) AS AvgScore
FROM Enrolment
GROUP BY Subject
HAVING AVG(Score) > 75;
Result:
| Subject | AvgScore |
|---|---|
| Computing | 81.0 |
| Maths | 88.0 |
English (average 72.0) is filtered out after grouping because its group average fails the HAVING test. Computing and Maths remain.
This is the single most examined point in the lesson. They filter at different stages:
| Clause | Filters | When | Can it use an aggregate? |
|---|---|---|---|
WHERE | Individual rows | Before grouping | No |
HAVING | Whole groups | After grouping | Yes |
A query may use both, and they do different jobs. The following asks: for enrolments scored 70 or above (row filter), what is the average per subject, showing only subjects whose qualifying average exceeds 80 (group filter)?
SELECT Subject, AVG(Score) AS AvgScore
FROM Enrolment
WHERE Score >= 70
GROUP BY Subject
HAVING AVG(Score) > 80;
WHERE Score >= 70 discards individual low rows first; the survivors are grouped; HAVING then drops whole subjects whose average is not above 80. Crucially, the two clauses can change the answer independently — WHERE changes which rows enter a group; HAVING changes which groups survive.
-- INVALID
SELECT Subject, AVG(Score)
FROM Enrolment
WHERE AVG(Score) > 75
GROUP BY Subject;
At the moment WHERE runs, the rows have not yet been grouped, so AVG(Score) does not exist — there is no group to average over. Move the condition to HAVING, which runs after grouping.
A simple decision rule resolves almost every "WHERE or HAVING?" choice: ask whether the condition can be judged by looking at a single row on its own, or whether it needs the whole group. Score >= 70 and Subject = 'Maths' can be decided from one row, so they belong in WHERE. AVG(Score) > 80 and COUNT(*) > 5 can only be decided once the group exists, so they belong in HAVING. When a query needs both kinds of condition — keep certain rows, then keep certain groups — use both clauses together, exactly as in the combined example above. Internalising "single row → WHERE; whole group → HAVING" makes the distinction automatic under exam pressure rather than something to puzzle out each time.
A summarising query is written in one order but processed in another. Knowing the processing order explains every rule above:
SELECT columns / aggregates -- 5: choose what to output
FROM table(s) -- 1: gather and join rows
WHERE row_conditions -- 2: filter individual rows
GROUP BY columns -- 3: partition survivors into groups
HAVING group_conditions -- 4: filter whole groups
ORDER BY columns; -- 6: sort the final rows
Processing order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. Because WHERE (step 2) precedes GROUP BY (step 3), it cannot reference an aggregate; because HAVING (step 4) follows it, it can. Because SELECT aliases are created at step 5, some DBMSs forbid using a SELECT alias inside HAVING — repeat the aggregate to be safe.
Scenario: the school wants management reports from the Student, Course and Enrolment tables.
Task 1 — How many students are enrolled on each course (including courses with none)? A LEFT JOIN keeps courses that have no enrolments:
SELECT c.CourseName, COUNT(e.StudentID) AS EnrolledStudents
FROM Course c
LEFT JOIN Enrolment e ON c.CourseID = e.CourseID
GROUP BY c.CourseName
ORDER BY EnrolledStudents DESC;
Result (illustrative):
| CourseName | EnrolledStudents |
|---|---|
| Intro to Computing | 2 |
| Pure Mathematics | 2 |
| English Literature | 2 |
| Further Maths | 0 |
COUNT(e.StudentID) correctly returns 0 for Further Maths because the unmatched LEFT JOIN row has a NULL StudentID, which COUNT(column) does not count — exactly the NULL behaviour discussed above. COUNT(*) here would have wrongly returned 1.
Task 2 — Which subjects have an average score above 80?
SELECT Subject, ROUND(AVG(Score), 1) AS AvgScore, COUNT(*) AS Entries
FROM Enrolment
GROUP BY Subject
HAVING AVG(Score) > 80;
Result:
| Subject | AvgScore | Entries |
|---|---|---|
| Computing | 81.0 | 2 |
| Maths | 88.0 | 2 |
Task 3 — Which courses have more than 30 enrolments (popular courses)?
SELECT c.CourseName, COUNT(e.StudentID) AS Total
FROM Course c
JOIN Enrolment e ON c.CourseID = e.CourseID
GROUP BY c.CourseName
HAVING COUNT(e.StudentID) > 30
ORDER BY Total DESC;
This combines a join, GROUP BY, and a HAVING on an aggregate — the multi-skill pattern examiners favour for higher-mark questions.
Examiners reward candidates who can predict the output of a grouped query, so it pays to be able to "execute" one by hand. Take this query against our six-row Enrolment sample, asking for the count and average per subject but only for subjects with more than one entry:
SELECT Subject, COUNT(*) AS Entries, AVG(Score) AS AvgScore
FROM Enrolment
WHERE Score >= 70
GROUP BY Subject
HAVING COUNT(*) > 1
ORDER BY AvgScore DESC;
Walk the processing order:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.