AQA GCSE Maths: Statistics and Probability Revision Guide
AQA GCSE Maths: Statistics and Probability Revision Guide
Statistics and Probability together account for approximately 25% of the marks on the AQA GCSE Maths qualification -- roughly 60 marks across the three papers. These topics appear on all three papers, including the non-calculator Paper 1, so you cannot afford to leave them to the last minute.
The good news is that Statistics and Probability are often more accessible than Algebra or Geometry. Many of the skills involved -- reading charts, calculating averages, working out simple probabilities -- build on ideas you already use in everyday life. For students aiming to maximise their grade, this is an excellent area to secure reliable marks.
This guide covers every Statistics and Probability topic on the AQA specification, from data collection through to conditional probability at Higher tier.
Data Collection
Before you can analyse data, you need to understand how it is collected and classified. The AQA specification expects you to know the different types of data and the methods used to gather it.
Types of data. Data can be classified in two ways. First, it is either qualitative (descriptive, such as colour or favourite subject) or quantitative (numerical, such as height or number of siblings). Second, quantitative data is either discrete (takes specific values, often counted -- for example, the number of pets someone owns) or continuous (can take any value within a range, often measured -- for example, weight in kilograms).
Primary and secondary data. Primary data is data you collect yourself through experiments, surveys, or observations. Secondary data has already been collected by someone else -- for example, from the internet or a government database. Primary data gives you control over accuracy but is time-consuming. Secondary data is quicker to obtain but may contain errors or bias you cannot verify.
Sampling methods. When a population is too large to survey entirely, you take a sample. Random sampling gives every member an equal chance of selection. Systematic sampling selects every nth item from an ordered list. Stratified sampling divides the population into groups (strata) and selects from each in proportion to its size -- for example, if 60% of a year group is female, 60% of the sample should be female.
Bias and questionnaire design. A biased sample does not represent the population fairly. Bias can arise from the sampling method (for example, only surveying people at one location) or from the questions themselves. When designing questionnaires, avoid leading questions, overlapping response categories (such as "1-5" and "5-10"), and vague time frames. Questions should be clear, neutral, and provide non-overlapping response options.
Averages and Spread
Averages and measures of spread are among the most frequently tested topics in the entire GCSE Maths specification. You need to be able to calculate them from raw data, from frequency tables, and from grouped frequency tables.
Mean -- add all the values and divide by the number of values. From a frequency table, multiply each value by its frequency, sum the products, and divide by the total frequency. From a grouped frequency table, use the midpoint of each class -- this gives an estimated mean, because the exact values within each group are unknown.
Median -- the middle value when data is arranged in order. For an even number of values, take the mean of the two middle values. From a frequency table, use cumulative frequencies to locate the median position at (n + 1) / 2.
Mode -- the most frequently occurring value or class. From a grouped frequency table, identify the modal class (the interval with the highest frequency).
Range -- the difference between the largest and smallest values. It is a simple measure of spread but heavily influenced by outliers.
Quartiles and interquartile range (Higher). The lower quartile (Q1) is the value one quarter of the way through the ordered data. The upper quartile (Q3) is the value three quarters of the way through. The interquartile range (IQR) is Q3 minus Q1 and measures the spread of the middle 50% of the data, making it more resistant to outliers than the range.
Comparing distributions. A common exam question asks you to compare two data sets. Always compare an average (mean or median) and a measure of spread (range or IQR). State which data set has the higher or lower value and explain what this means in context -- for example: "The median for Class A is higher, suggesting they generally performed better. The range for Class B is larger, suggesting more spread."
Representing Data
AQA expects you to draw, read, and interpret a range of statistical diagrams. At Foundation tier you will encounter the standard chart types, while Higher tier introduces cumulative frequency diagrams, box plots, and histograms with unequal class widths.
Bar charts, pie charts, and pictograms. These are straightforward but still tested, often in the context of interpretation rather than construction. For pie charts, remember that the angles must add up to 360 degrees and each angle is found by dividing the frequency by the total and multiplying by 360.
Frequency polygons. These are drawn by plotting the frequency against the midpoint of each class interval and joining the points with straight lines. They are useful for comparing two distributions on the same axes.
Cumulative frequency diagrams (Higher). Plot the running total of frequencies against the upper class boundaries. The curve lets you estimate the median (at n/2 on the y-axis), lower quartile (at n/4), and upper quartile (at 3n/4). Draw horizontal lines from these positions to the curve, then read down to the x-axis.
Box plots (Higher). A box plot displays the minimum, lower quartile, median, upper quartile, and maximum. Box plots are excellent for comparing distributions -- compare the medians for central tendency and the box lengths (IQR) for spread. Always refer to specific values when comparing.
Histograms with unequal class widths (Higher). The y-axis shows frequency density (frequency divided by class width), not frequency. The area of each bar represents the frequency. To find frequency from a histogram, multiply frequency density by class width.
Scatter graphs and correlation. Scatter graphs plot two variables against each other. Positive correlation -- as one increases, the other tends to increase. Negative correlation -- as one increases, the other tends to decrease. No correlation -- no clear relationship. A line of best fit should pass through the mean point with roughly equal numbers of points on each side. Use it for interpolation (estimating within the data range -- reliable) but be cautious about extrapolation (estimating beyond the data range -- less reliable as the trend may not continue).
Probability Basics
Probability questions appear on every AQA GCSE Maths paper. The fundamentals are tested at both tiers, and a secure understanding of the basics is essential before moving on to combined events.
The probability scale. Probability is measured on a scale from 0 (impossible) to 1 (certain). A probability can be expressed as a fraction, a decimal, or a percentage. In the exam, fractions and decimals are most common.
Calculating theoretical probability. The probability of an event is the number of favourable outcomes divided by the total number of possible outcomes, provided all outcomes are equally likely. For example, the probability of rolling a 3 on a fair six-sided die is 1/6.
Relative frequency and experimental probability. When theoretical probability cannot be calculated (for example, a biased spinner), estimate the probability from experimental data. The relative frequency is the number of times the event occurred divided by the total number of trials. More trials give a more reliable estimate -- this is the law of large numbers.
Expected outcomes. Multiply the probability by the number of trials. If P(red) = 0.3 and you spin 200 times, you expect 0.3 x 200 = 60 reds. Note that expected outcomes are theoretical predictions and may not match actual results.
Sample space diagrams. A sample space diagram lists all possible outcomes for two combined events. Rolling two dice, for example, produces a 6-by-6 grid of 36 combinations. Count the outcomes satisfying a condition to find the probability.
Combined Events
Once you can calculate single-event probabilities, the specification requires you to handle situations where two or more events are combined.
Independent and dependent events. Two events are independent if the outcome of one does not affect the other -- for example, flipping a coin and rolling a die. They are dependent if the first outcome changes the probabilities of the second -- for example, drawing two sweets from a bag without replacement.
The AND rule and the OR rule. For independent events, the probability of both A and B occurring is P(A) x P(B) -- you multiply. For mutually exclusive events (events that cannot happen at the same time), the probability of A or B is P(A) + P(B) -- you add. The general OR rule is P(A or B) = P(A) + P(B) -- P(A and B), but at GCSE the questions are usually structured so that events are either mutually exclusive or you use a tree diagram.
Tree diagrams. Tree diagrams are the most important tool for combined probability at GCSE. Each branch represents a possible outcome with its probability written alongside. Draw the first set of branches for the first event, then from each outcome draw a second set for the second event. Multiply along the branches to find each combined probability, and add the relevant results to answer the question.
With and without replacement. If an item is replaced after the first selection, the probabilities on the second branches stay the same. If it is not replaced, the probabilities change because the total has decreased by one. Always read the question carefully to check whether replacement occurs -- this is where many students lose marks.
Venn diagrams and set notation. Venn diagrams represent events and their overlap. The intersection (A and B) represents outcomes in both sets. The union (A or B) represents outcomes in at least one set. The complement (A') represents outcomes not in A. To find probabilities from a Venn diagram, divide the relevant count by the total.
Two-way tables. A two-way table displays frequencies for two categorical variables in rows and columns with totals. To find a probability, identify the relevant cell or cells and divide by the total.
Conditional Probability (Higher)
Conditional probability is one of the more demanding topics on the Higher tier, but it follows logically from the combined events work above.
What is conditional probability? Conditional probability is the probability of an event occurring given that another event has already occurred. It is written as P(A|B), which reads as "the probability of A given B."
Using tree diagrams for conditional probability. When a tree diagram involves dependent events (without replacement), the second set of branches already shows conditional probabilities. To find P(A|B), identify the branch where B occurs and focus on the sub-branches from there.
Using Venn diagrams for conditional probability. To find P(A|B) from a Venn diagram, divide the number of outcomes in both A and B (the intersection) by the total number of outcomes in B. You are restricting your attention to the "B" circle and asking what proportion also falls in A.
The formula. P(A|B) = P(A and B) / P(B). This is the Venn diagram method expressed algebraically.
Conditional probability questions are often worth 4-5 marks and appear toward the end of a paper. The method is consistent once you understand it, and the marks are very achievable with careful working.
Common Mistakes
Statistics and Probability questions have a number of recurring pitfalls that cost students marks every year. Being aware of them is half the battle.
- Not reading frequency tables carefully. Confusing data values with frequencies is one of the most common errors. When calculating the mean, make sure you multiply each value by its frequency -- do not simply add the values.
- Confusing mean and median for grouped data. From a grouped frequency table, you can calculate an estimated mean (using midpoints) and identify the modal class, but you cannot find the exact median directly. The median from grouped data is estimated using a cumulative frequency diagram.
- Drawing scatter graphs without labels. Always label both axes and use a consistent scale. Plot points with neat crosses, not large blobs.
- Probability answers greater than 1 or less than 0. A probability must always be between 0 and 1. If your answer falls outside this range, go back and check.
- Forgetting to simplify probabilities. Some questions specifically require the simplest form. Get into the habit of simplifying.
- Tree diagram errors with dependent events. The most common mistake is failing to adjust totals on the second branches. If you start with 5 red and 3 blue counters and draw one red, the second branches should show 4 red and 3 blue from a total of 7 -- not 8.
Exam Technique for Statistics and Probability
Statistics and Probability questions at GCSE often require you to interpret data and explain your findings, not just calculate. Here is how to approach them effectively.
Interpretation questions. When asked to compare data sets or explain what a chart shows, refer to specific values. "Class A did better" is not enough -- write "The median for Class A is 56, compared to 43 for Class B, suggesting Class A generally scored higher." Always tie your answer to the context of the question.
Check your probabilities. On any tree diagram, the probabilities on each set of branches must sum to 1. Similarly, in a Venn diagram, the values in all regions (including the region outside the circles but inside the rectangle) must add up to the total. Use these checks to verify your working.
"Explain" and "Give a reason" questions. These appear frequently in Statistics. If asked to criticise a sampling method or comment on an extrapolation, be specific. Name the problem ("The sample is biased because it only includes students from one tutor group") and explain why it matters ("so the results may not represent the whole year group").
Show your working. Even when the answer seems obvious, write down the calculation. Show the multiplication for the AND rule. Draw the lines on cumulative frequency diagrams. Method marks are available on most multi-mark questions and can save you if you make a small arithmetic error.
Time management. Statistics questions are often among the more time-efficient questions on the paper -- they tend to involve fewer steps than complex algebra or geometry problems. Do not rush through them, but recognise that they are an opportunity to bank marks efficiently. For more detail on managing your time across all three papers, see our AQA GCSE Maths exam technique guide.
Prepare with LearningBro
LearningBro offers targeted courses to help you master these topics. Our Statistics course covers data collection, averages, spread, and every type of statistical diagram on the specification. Our Probability course takes you from basic probability through tree diagrams, Venn diagrams, and conditional probability with practice questions that mirror the real exam.
Once you are confident with the content, the AQA GCSE Maths exam preparation course will help you develop the exam technique to convert knowledge into marks.
Start with a free lesson preview to see how the courses work. Statistics and Probability are among the most rewarding areas to revise -- the methods are consistent, the question types are predictable, and the marks are there for the taking.
Good luck with your revision.