You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
This lesson covers methods of presenting and interpreting data as required by the Edexcel A-Level Mathematics specification (9MA0), Paper 3 Section A -- Statistics. You must be able to construct and interpret box plots, cumulative frequency diagrams, histograms (including those with unequal class widths), stem-and-leaf diagrams, and compare distributions.
A box plot displays the five-number summary of a data set:
An outlier is typically defined as any value that lies:
Exam Tip: Always state the outlier rule you are using. If the question says "an outlier is defined as a value more than 1.5 x IQR beyond the nearest quartile", use that definition exactly.
A cumulative frequency diagram shows the running total of frequencies up to each class boundary.
| Weight (kg) | Frequency | Cumulative Frequency |
|---|---|---|
| 50 ≤ w < 55 | 8 | 8 |
| 55 ≤ w < 60 | 15 | 23 |
| 60 ≤ w < 65 | 22 | 45 |
| 65 ≤ w < 70 | 18 | 63 |
| 70 ≤ w < 75 | 12 | 75 |
| 75 ≤ w < 80 | 5 | 80 |
Total n = 80. Median is at cumulative frequency 40. This falls in the 60 ≤ w < 65 class.
Exam Tip: Always plot cumulative frequency against the upper class boundary, never the midpoint.
A histogram is a bar chart where the area of each bar represents the frequency. This is particularly important when the classes have unequal widths.
The vertical axis of a histogram is frequency density, calculated by:
Frequency density = Frequency / Class width
If class widths are equal, you can use frequency directly. But when class widths are unequal, using frequency directly would be misleading because a wider class would appear to have more data simply because its bar is wider. By using frequency density, the area of each bar equals the frequency, making the chart fair.
| Time (minutes) | Frequency | Class Width | Frequency Density |
|---|---|---|---|
| 0 ≤ t < 5 | 10 | 5 | 10/5 = 2.0 |
| 5 ≤ t < 10 | 15 | 5 | 15/5 = 3.0 |
| 10 ≤ t < 20 | 30 | 10 | 30/10 = 3.0 |
| 20 ≤ t < 40 | 24 | 20 | 24/20 = 1.2 |
| 40 ≤ t < 60 | 6 | 20 | 6/20 = 0.3 |
To find the frequency from a histogram: Frequency = Frequency density x Class width (i.e. the area of the bar).
Exam Tip: A very common exam question gives you a histogram and asks you to find the frequency for a particular class, or to complete a frequency table. Use Area = Frequency density x Class width.
A stem-and-leaf diagram displays data by splitting each value into a stem (the leading digit(s)) and a leaf (the final digit).
Data: 23, 25, 27, 31, 34, 34, 36, 38, 41, 43, 45, 52
| Stem | Leaf |
|---|---|
| 2 | 3 5 7 |
| 3 | 1 4 4 6 8 |
| 4 | 1 3 5 |
| 5 | 2 |
Key: 2 | 3 means 23.
Used to compare two data sets. One data set has leaves extending to the left of the stem; the other extends to the right. Leaves are still in ascending order (moving away from the stem in each direction).
When comparing two (or more) data sets, you should comment on:
Compare the means or medians. State which data set has a higher/lower average and what this means in context.
Example: "The median score for Class A (67) is higher than the median score for Class B (52), suggesting students in Class A performed better on average."
Compare the ranges, interquartile ranges, or standard deviations. State which data set is more or less spread out.
Example: "The IQR for Class A (15) is smaller than the IQR for Class B (28), indicating that scores in Class A are more consistent."
Comment on the shape of the distribution. A distribution may be:
When given two box plots side by side, compare:
Exam Tip: Always make comparisons in context. Do not just write "the median is higher". Write "the median height of boys (172 cm) is greater than the median height of girls (163 cm), suggesting boys tend to be taller."
Before analysing data, it is important to consider:
At A-Level, you are expected to identify anomalies and comment on their possible impact on statistical measures.
Edexcel 9MA0-03 specification, Paper 3 — Statistics and Mechanics, Section 2 (Data presentation and interpretation) covers interpret diagrams for single-variable data, including histograms with unequal class widths, frequency polygons, box-and-whisker plots and cumulative-frequency diagrams; identify outliers from a data set using a stated rule; compare distributions using appropriate measures of central tendency and spread (refer to the official specification document for exact wording). This sub-strand is examined alongside Section 1 (Statistical sampling) and Section 3 (Probability), with cross-links into Section 4 (Statistical distributions) and Section 5 (Hypothesis testing). The Edexcel formula booklet does not list the outlier rule Q1−1.5×IQR / Q3+1.5×IQR — it must be memorised, and the question stem will normally state it explicitly because alternative rules (such as ±2σ from the mean) are also acceptable in different contexts.
Question (8 marks):
The mass, m grams, of 80 apples from an orchard is summarised by the histogram below (described): class boundaries 80≤m<100 (frequency density 0.4), 100≤m<110 (1.6), 110≤m<120 (2.4), 120≤m<140 (0.8), 140≤m<200 (0.1), with an additional class 60≤m<80 (frequency 10).
(a) Estimate the number of apples with mass less than 115 g. (3)
(b) The lower quartile is Q1=105 g and the upper quartile is Q3=125 g. An outlier is defined as any value more than 1.5×IQR below Q1 or above Q3. Determine the boundaries beyond which an apple's mass would be classified as an outlier, and state, with reasoning, whether the heaviest class 140≤m<200 contains any potential outliers. (5)
Solution with mark scheme:
(a) Step 1 — recover frequencies from frequency densities.
Frequency = frequency density × class width:
Together with the stated class 60≤m<80 (frequency 10), the total is 10+8+16+24+16+6=80, matching the sample size.
M1 — multiplying frequency density by class width, not reading frequency directly off the vertical axis. The single most common error on histogram questions is treating frequency density as frequency.
Step 2 — use linear interpolation across the class containing 115.
The class 110≤m<120 contains 24 apples. Assuming uniform distribution within the class, the proportion below 115 is 120−110115−110=0.5, contributing 0.5×24=12 apples.
M1 — correct linear interpolation set-up.
Step 3 — sum cumulative frequency below 115.
10+8+16+12=46 apples.
A1 — fully correct cumulative count.
(b) Step 1 — compute IQR.
IQR=Q3−Q1=125−105=20.
B1 — correct IQR.
Step 2 — compute outlier boundaries.
Lower boundary: Q1−1.5×IQR=105−30=75 g.
Upper boundary: Q3+1.5×IQR=125+30=155 g.
M1 A1 — M1 for the 1.5×IQR structure, A1 for both numerical boundaries.
Step 3 — interpret in context.
The class 140≤m<200 has 6 apples. Any apple with mass >155 g is a potential upper outlier. The class spans 140 to 200, so apples with mass between 155 and 200 are outliers. Assuming uniform distribution within this class, the expected number of outliers is 200−140200−155×6=6045×6=4.5, so we estimate between 4 and 5 outliers in this class.
M1 A1 — M1 for comparing class boundaries against 155, A1 for stating that the class does contain outliers with a contextual estimate.
Total: 8 marks (M5 A2 B1, split as shown).
Question (6 marks): A box-and-whisker plot summarises the daily rainfall (r mm) in two locations, A and B, over the same 60-day period.
(a) For each location, determine which (if any) data values would be classified as outliers using the rule 1.5×IQR beyond the quartiles. (3)
(b) Compare the two distributions, referring to both centre and spread. (3)
Mark scheme decomposition by AO:
(a)
(b)
Total: 6 marks split AO1 = 1, AO2 = 4, AO3 = 1. Note the AO2 dominance — comparing distributions is interpretive work, and Edexcel rewards explicit linking of statistical measures to contextual claims.
Connects to:
Section 1 — Measures of location and spread: medians, quartiles and IQR feed directly into both box plots and the outlier rule. The choice between mean/SD and median/IQR pairs depends on whether the distribution is symmetric or skewed — a decision that recurs throughout descriptive statistics and motivates robust estimation.
Section 4 — Correlation and regression: scatter plots and residual plots are the bivariate analogues of histograms and box plots. Identifying influential points in regression uses an outlier-style rule (residuals more than 2 standard deviations from 0), structurally identical to the 1.5×IQR rule for univariate data.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.