Statistics Exam Practice
This capstone lesson brings together every statistics topic from the Edexcel GCSE Mathematics (1MA1) specification into a bank of full exam-style questions with complete mark-scheme-style solutions. Mark allocations are shown for every question. After each solution an Examiner's note highlights where students commonly lose marks. Use this lesson before sitting a past paper to sharpen your technique.
How Statistics Appears on the Edexcel GCSE
- Statistics questions appear on all three papers (Paper 1 non-calculator, Papers 2 and 3 calculator).
- Paper 1 often has non-calculator averages and interpreting charts; Papers 2 and 3 carry the heavier calculation-based questions (estimated means, CF graphs, histograms).
- Questions are frequently 3–6 marks. Some "problem solving" questions are worth 5–7 marks and combine two or more topics.
- Method marks are available even if the final numerical answer is wrong — always show working.
Question 1: Averages from a Frequency Table [4 marks]
The table shows the number of goals scored by a football team in 20 matches.
| Goals (x) | 0 | 1 | 2 | 3 | 4 |
|---|
| Frequency (f) | 3 | 7 | 5 | 4 | 1 |
(a) Calculate the mean number of goals per match. [2]
(b) Find the median. [1]
(c) State the mode. [1]
Mark-scheme solution
(a)
- Σfx = (0×3) + (1×7) + (2×5) + (3×4) + (4×1) = 0 + 7 + 10 + 12 + 4 = 33 (M1 for a correct fx method)
- Σf = 20
- Mean = 33 ÷ 20 = 1.65 goals (A1)
(b) n = 20 → median position = (20 + 1) ÷ 2 = 10.5th value
- Cumulative frequencies: 3, 10, 15, 19, 20
- The 10th value is 1 goal (CF reaches 10 at x = 1) and the 11th is 2 goals (CF reaches 15 at x = 2)
- Median = (1 + 2) ÷ 2 = 1.5 goals (B1)
(c) Mode = 1 goal (highest frequency = 7) (B1)
Examiner's note
- Many students incorrectly divide by 5 (the number of rows) instead of by Σf = 20. Always divide by the total frequency.
- For the median, it is common to forget to look at two values (the 10th and 11th) when n is even, and simply pick one.
- "Mode" means the x-value with the highest frequency — not the frequency itself. Don't write "7" as the mode.
Question 2: Estimated Mean from Grouped Data [5 marks]
The table shows the heights (h cm) of 50 plants.
| Height (h cm) | Frequency |
|---|
| 0<h≤10 | 4 |
| 10<h≤20 | 11 |
| 20<h≤30 | 18 |
| 30<h≤40 | 12 |
| 40<h≤50 | 5 |
(a) Write down the modal class. [1]
(b) Calculate an estimate for the mean height. [3]
(c) Explain why your answer to (b) is an estimate. [1]
Mark-scheme solution
(a) Modal class = 20<h≤30 (B1)
(b) Midpoints: 5, 15, 25, 35, 45 (M1 for using midpoints)
| Class | f | Midpoint (x) | fx |
|---|
| 0<h≤10 | 4 | 5 | 20 |
| 10<h≤20 | 11 | 15 | 165 |
| 20<h≤30 | 18 | 25 | 450 |
| 30<h≤40 | 12 | 35 | 420 |
| 40<h≤50 | 5 | 45 | 225 |
| Total | 50 | | 1,280 |
Σfx = 1,280 (M1 for correct total)
Estimated mean = 1,280 ÷ 50 = 25.6 cm (A1)
(c) It is an estimate because midpoints are used to represent the values in each class; we do not know the exact heights. (B1)
Examiner's note
- Biggest loss of marks: students forget to use midpoints and instead use the lower boundary or the class width. Always show the midpoint column.
- For (c), a vague statement like "because the data is grouped" is not enough. You need to reference the use of midpoints.
- Always include units (cm, minutes, £, etc.) for full marks.
Question 3: Cumulative Frequency and Box Plot [6 marks]
The table shows the time (t minutes) 60 commuters take to travel to work.
| Time (t minutes) | Frequency |
|---|
| 0<t≤10 | 4 |
| 10<t≤20 | 10 |
| 20<t≤30 | 18 |
| 30<t≤40 | 16 |
| 40<t≤50 | 8 |
| 50<t≤60 | 4 |
(a) Complete the cumulative frequency table. [1]
(b) Use your CF graph to estimate the median and the interquartile range. [3]
(c) Draw a box plot. [2]
Mark-scheme solution
(a) CF: 4, 14, 32, 48, 56, 60 (B1)
(b) Points to plot at upper class boundaries: (10, 4), (20, 14), (30, 32), (40, 48), (50, 56), (60, 60), joined with a smooth curve (M1).
- Median at CF = 30 (since n/2 = 30): read across → about 29 minutes (A1)
- Q1 at CF = 15 (n/4): about 21 minutes; Q3 at CF = 45 (3n/4): about 38 minutes
- IQR = 38 − 21 = 17 minutes (A1)
(c) Box plot on a number line from 0 to 60. Box from 21 to 38, median line at 29, whiskers extending to 0 (minimum) and 60 (maximum). (B2 — B1 for correct box, B1 for correct whiskers and median.)
Examiner's note
- The #1 source of lost marks: plotting CF at the midpoint instead of the upper class boundary. Always use the upper boundary.
- For the quartile positions on a CF curve, use n/2, n/4, 3n/4 — NOT (n+1)/2 etc. The (n+1) formulas are for listed data, not grouped.
- Many students join the CF points with straight lines. Edexcel expects a smooth S-shaped curve.
- For the box plot, the median line must be vertical inside the box — not at the end.
Question 4: Scatter Graph and Correlation [5 marks]
A science teacher measures the daily temperature (°C) and records the number of ice lollies sold at the school tuck shop over 10 days.
| Temp (°C) | 12 | 15 | 18 | 20 | 22 | 24 | 25 | 27 | 28 | 30 |
|---|
| Lollies | 5 | 8 | 14 | 18 | 20 | 28 | 25 | 30 | 34 | 38 |
(a) Describe the correlation and interpret it in context. [2]
(b) The mean point is calculated as (22.1, 22). Use this to help draw a line of best fit, then estimate lolly sales at 23°C. [2]
(c) A student predicts 55 lollies would sell at 40°C. Comment on this estimate. [1]
Mark-scheme solution
(a) Strong positive correlation (B1). As the temperature increases, the number of lollies sold tends to increase (B1 for correct context).
(b) Plot the mean point (22.1, 22) and draw a straight line through it following the trend, with roughly equal points above and below. (M1)
- Reading at x = 23: approximately 24 lollies (A1 — accept any answer from a valid line of best fit, typically 22–26.)
(c) This is extrapolation — 40°C is outside the data range (12–30°C). The estimate is unreliable because we cannot assume the linear trend continues at such high temperatures (demand may saturate). (B1)
Examiner's note
- "Describe the correlation" needs both the type (positive/negative/none) and an interpretation in context. Without context you lose a mark.
- When drawing a line of best fit, go through the mean point — don't just draw a line that looks roughly right. This is a free mark if done correctly.
- Learn the word extrapolation and use it. Saying "the temperature is too high" without explaining why the trend may not continue won't score full marks.
Question 5: Stratified Sampling and Bias [4 marks]
A gym has 180 male and 120 female members. The manager wants a stratified sample of 50 members.
(a) Calculate how many males and females should be in the sample. [2]
(b) Explain why a stratified sample is better than simply asking the first 50 members who arrive. [2]
Mark-scheme solution
(a)
- Males: (180 ÷ 300) × 50 = 30 (M1 for the correct method; A1 for the correct answer)
- Females: (120 ÷ 300) × 50 = 20
- Check: 30 + 20 = 50 ✓
(b) A stratified sample ensures both genders are represented in proportion to the population (B1). Asking the first 50 arrivals could be biased — if more males arrive early in the morning, they would be over-represented; the sample would not reflect the full membership (B1).
Examiner's note
- Always state the total population (300) clearly before using the fraction.
- Round stratum counts sensibly; here both come out as whole numbers so no rounding is needed.
- In (b), simply saying "the sample is biased" is not enough — you must say why and in what direction.
Question 6: Histogram with Unequal Class Widths [H] [6 marks]
The table shows the distances (d km) travelled by 100 delivery drivers in one day.
| Distance (d km) | Frequency |
|---|
| 0<d≤20 | 10 |
| 20<d≤30 | 15 |
| 30<d≤40 | 30 |
| 40<d≤60 | 28 |
| 60<d≤100 | 17 |
(a) Calculate the frequency density for each class. [3]
(b) The tallest bar on the histogram has frequency density 3.0. State which class this represents. [1]
(c) Estimate the number of drivers who travelled between 35 and 50 km. [2]
Mark-scheme solution
(a) (M1 for identifying class widths; M1 for dividing by class width; A1 for all five values correct.)
| Class | Width | Frequency density |
|---|
| 0<d≤20 | 20 | 10 ÷ 20 = 0.5 |
| 20<d≤30 | 10 | 15 ÷ 10 = 1.5 |
| 30<d≤40 | 10 | 30 ÷ 10 = 3.0 |
| 40<d≤60 | 20 | 28 ÷ 20 = 1.4 |
| 60<d≤100 | 40 | 17 ÷ 40 = 0.425 |
(b) Tallest bar (FD = 3.0) → 30<d≤40 (B1)
(c)
- 35 to 40 (half of the 30–40 class): (5 ÷ 10) × 30 = 15 (M1)
- 40 to 50 (half of the 40–60 class): (10 ÷ 20) × 28 = 14
- Total ≈ 15 + 14 = 29 drivers (A1)
Examiner's note
- "Frequency density" must be written using the formula frequency ÷ class width. If you write frequency × class width you lose all marks.
- For part (c), many students simply add 30 + 28 = 58, forgetting that only parts of each class fall inside 35–50. Always work out the proportion of each class you need.
- The assumption "data is evenly distributed within a class" should ideally be stated — it's often worth a mark.
Question 7: Comparing Box Plots [H] [4 marks]
Two supermarkets recorded the waiting times (minutes) at their checkouts. The box plots are summarised below.
| Supermarket A | Supermarket B |
|---|
| Minimum | 1.0 | 3.5 |
| Q1 | 3.0 | 5.8 |
| Median | 4.2 | 6.8 |
| Q3 | 5.1 | 7.3 |
| Maximum | 8.2 | 10.0 |
Compare the waiting times at the two supermarkets. [4]
Mark-scheme solution
- Supermarket A has a lower median (4.2 < 6.8) → customers wait less time on average at Supermarket A (B1 for median comparison, B1 for context).
- Supermarket A has a larger IQR (5.1 − 3.0 = 2.1 vs 7.3 − 5.8 = 1.5). Its waiting times are more spread out — so Supermarket B is more consistent in its service time, even though customers wait longer on average (B1 for IQR comparison, B1 for context).
Examiner's note
- Always make two comparisons: one about centre (median) and one about spread (IQR or range). You get no marks for writing two sentences about the median.
- "In context" is compulsory. "Supermarket A has a smaller IQR" scores no marks — "Supermarket A has less consistent waiting times" does.
- When the IQR/range conflict with the median (as here), acknowledge the trade-off explicitly.