You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
This lesson is Higher tier. When data is grouped, you cannot read exact quartiles from a table — but you can draw a cumulative frequency curve and read estimates of the median, quartiles and interquartile range straight off it. A box plot (box-and-whisker diagram) then summarises those five key values in a single, comparison-friendly picture. Both are core Higher-tier tools in the OCR GCSE Mathematics (J560) Statistics strand, and questions almost always end by asking you to compare two data sets using a median and a measure of spread.
The topic blends AO1 (plotting the curve and drawing the box plot accurately) with AO2 (reading off and interpreting the median, quartiles and IQR) and AO3 (comparing distributions and judging which is more consistent). OCR command words include "Draw", "Estimate", "Find", "Compare" and "Give a reason for your answer".
| Term | Meaning |
|---|---|
| Cumulative frequency | A running total of frequencies up to the top of each class |
| Cumulative frequency curve | A smooth S-shaped graph of cumulative frequency against the upper class boundary |
| Median (Q2) | The middle value; at cumulative frequency 2n |
| Lower quartile (Q1) | The value a quarter of the way through; at 4n |
| Upper quartile (Q3) | The value three-quarters through; at 43n |
| Interquartile range (IQR) | Q3−Q1; the spread of the middle 50% |
| Box plot | A diagram showing minimum, Q1, median, Q3 and maximum |
Cumulative frequency is a running total: for each class you add its frequency to all those before it. You always plot the cumulative frequency against the upper boundary of the class, because by that point all of those values have been counted.
Complete the cumulative frequency column for the times, t minutes, taken by 60 students to finish a quiz.
| Time t (min) | Frequency | Cumulative frequency |
|---|---|---|
| 0<t≤10 | 4 | 4 |
| 10<t≤20 | 8 | 12 |
| 20<t≤30 | 18 | 30 |
| 30<t≤40 | 18 | 48 |
| 40<t≤50 | 9 | 57 |
| 50<t≤60 | 3 | 60 |
Solution: Add each frequency to the running total: 4; 4+8=12; 12+18=30; 30+18=48; 48+9=57; 57+3=60. The final cumulative frequency equals n=60, a useful check.
Common error: plotting cumulative frequency against the class midpoint. It must be plotted against the upper boundary (10, 20, 30, …).
Plot each cumulative frequency against the upper class boundary, then join the points with a smooth curve (not straight segments). The result is a characteristic S-shape. The curve for the quiz data above is shown below.
The dashed line shows how to read the median: go across from cumulative frequency 260=30 to the curve, then down to the time axis.
Use the curve (or the table) to estimate the median time.
Solution: The median is at cumulative frequency 260=30. Reading across to the curve and down gives a time of about 30 minutes (the curve passes through the point (30, 30)).
On a cumulative frequency curve of n values:
Then IQR=Q3−Q1.
For the quiz data (n=60), estimate Q1, Q3 and the IQR.
Solution:
Common error: using 4n+1 on a cumulative frequency curve. For grouped data read from a curve, use 4n, 2n and 43n.
Use the curve to estimate how many students took more than 35 minutes.
Solution: Read up from 35 minutes to the curve and across: the cumulative frequency is about 39. That is the number who took 35 minutes or less, so the number taking more than 35 minutes is 60−39= about 21 students.
A cumulative frequency curve for the test marks of 160 students gives Q1=42, median =55 and Q3=68. (a) Work out the interquartile range. (b) The pass mark is 50. Roughly what can you say about the proportion who passed?
Solution: (a) IQR =Q3−Q1=68−42=26 marks. (b) The median is 55, which is above the pass mark of 50, so more than half the students passed. (To be precise you would read the cumulative frequency at mark 50 from the curve and compare with 160.)
Explain why 4n, 2n and 43n are used on a cumulative frequency curve rather than 4n+1 etc.
Solution: The 4n+1 formulae locate the position of a value in a small, listed data set. A cumulative frequency curve treats the data as a smooth, continuous quantity over the whole range, so we simply split the total frequency n into quarters and read the corresponding values. With large grouped data the difference between 4n and 4n+1 is negligible, and the curve method uses 4n, 2n, 43n by convention.
A box plot displays five numbers — the minimum, Q1, the median, Q3 and the maximum — on a number line. The box spans Q1 to Q3 (so its width is the IQR), a line inside marks the median, and "whiskers" reach out to the smallest and largest values.
The box plot below summarises the quiz times: minimum 8, Q1 22, median 30, Q3 38, maximum 56 minutes.
From a box plot you read: minimum 12, Q1 19, median 24, Q3 31, maximum 45. Find (a) the range and (b) the interquartile range.
Solution: (a) Range =45−12=33. (b) IQR =Q3−Q1=31−19=12.
Explain what the width of the box tells you in a box plot.
Solution: The box spans Q1 to Q3, so its width is the interquartile range — the spread of the middle 50% of the data. A narrow box means the central half of the data is tightly clustered (consistent); a wide box means it is more spread out.
Box plots are designed for comparison. Compare two data sets by their medians (which is higher on average) and their IQRs (which is more consistent), always in context.
Two classes sat a test. Class P: median 24, IQR 12. Class Q: median 28, IQR 5. Compare the two classes.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.