You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
This lesson covers the selection and construction of appropriate statistical diagrams for presenting real data. At A-Level, you are expected not only to draw diagrams accurately but also to choose the most suitable type for the data and the question being asked, and to interpret diagrams in context.
The choice of diagram depends on the type of data and the purpose of the presentation.
| Diagram | Best for | Data type |
|---|---|---|
| Histogram | Showing the distribution of a continuous variable | Continuous, grouped |
| Box plot | Comparing distributions, showing median, quartiles, and outliers | Continuous |
| Cumulative frequency diagram | Estimating median, quartiles, and percentiles | Continuous, grouped |
| Scatter diagram | Showing the relationship between two variables | Bivariate, continuous |
| Stem-and-leaf diagram | Displaying raw data while showing the distribution | Small data sets, discrete or continuous |
| Bar chart | Comparing frequencies across categories | Categorical |
| Pie chart | Showing proportions of a whole | Categorical |
| Time series | Showing how a variable changes over time | Continuous, ordered by time |
A histogram displays the distribution of a continuous variable using bars whose area (not height) represents frequency.
Frequency density=Class widthFrequency
Use a histogram when you have grouped continuous data with unequal class widths. If class widths are equal, a frequency bar chart is acceptable (and frequency density equals frequency scaled by a constant).
| Daily rainfall (mm) | Frequency | Class width | Frequency density |
|---|---|---|---|
| 0 ≤ r < 2 | 12 | 2 | 6.0 |
| 2 ≤ r < 5 | 9 | 3 | 3.0 |
| 5 ≤ r < 10 | 10 | 5 | 2.0 |
| 10 ≤ r < 20 | 5 | 10 | 0.5 |
| 20 ≤ r < 50 | 3 | 30 | 0.1 |
Notice how the classes have unequal widths, so frequency density must be used. The total area of all bars equals the total frequency (39).
A box plot provides a five-number summary of the data:
Outliers are plotted as individual points beyond the whiskers.
Box plots are particularly useful for comparing distributions — for example, comparing daily mean temperatures at two different weather stations or across two different months.
When comparing box plots, comment on:
Example: "The box plot for daily mean temperature at Heathrow in July shows a median of 19.5°C with an IQR of 3.2°C. The box plot for Leuchars shows a lower median of 14.8°C with a wider IQR of 4.1°C. This suggests temperatures at Heathrow are generally higher and less variable, likely due to its more southerly location and urban heat island effect."
A cumulative frequency diagram plots the running total of frequencies against the upper class boundary of each class.
From a cumulative frequency diagram, you can estimate:
"From the cumulative frequency diagram for daily total sunshine at Hurn in August, the median is approximately 6.5 hours and the IQR is approximately 4.0 hours. This indicates that on a typical August day in Hurn, there are about 6.5 hours of sunshine, but the amount varies considerably from day to day."
A scatter diagram shows the relationship between two variables by plotting pairs of values as points on a graph.
| Pattern | Description |
|---|---|
| Points trend upwards from left to right | Positive correlation |
| Points trend downwards from left to right | Negative correlation |
| No clear pattern | No correlation |
| Points close to a straight line | Strong correlation |
| Points widely scattered around the line | Weak correlation |
"The scatter diagram for daily mean temperature against daily total sunshine at Camborne shows a moderate positive correlation. This suggests that days with more sunshine tend to be warmer, which is physically reasonable — sunshine heats the Earth's surface, increasing air temperature."
Just because two variables are correlated does not mean one causes the other. There may be a confounding variable or the relationship may be coincidental.
A time series plots a variable against time to show trends, seasonal patterns, and irregular fluctuations.
Use a time series for any variable that changes over time — daily mean temperature across a year, monthly rainfall totals, etc.
Exam Tip: When asked to draw a diagram, check whether the class widths are equal. If they are unequal, you must use frequency density for a histogram. A common source of lost marks is plotting frequency instead of frequency density.
AQA 7357 specification, Paper 3 — Statistics, sub-strands N (data presentation and interpretation) and O (probability/statistical distributions in context) covers interpret diagrams for single-variable data, including understanding that area in a histogram represents frequency. Connect to probability distributions. Interpret scatter diagrams and regression lines for bivariate data. Recognise and interpret possible outliers in data sets and statistical diagrams. Select or critique data presentation techniques in the context of a statistical problem (refer to the official specification document for exact wording). All AQA Paper 3 statistics questions are explicitly contextualised against the Large Data Set (LDS) — a published collection of UK weather station observations. Candidates are expected to be familiar with its variables, units, location codes and the kinds of cleaning/missing-data conventions it uses, and to interpret diagrams in that context rather than abstractly. Diagram literacy is also assessed synoptically with summary statistics (mean, median, IQR, standard deviation), sampling methods and (in Year 2) regression and correlation.
Question (8 marks):
A subset of n=30 daily maximum gust speed readings (knots) is taken from a single LDS weather station for one month. The data are summarised below.
Min = 8, Q1 = 17, Median = 23, Q3 = 31, Max = 58
(a) Using the rule that an outlier is any value more than 1.5×IQR beyond the nearer quartile, identify the boundary values for outliers and state, with reasoning, whether the maximum value of 58 knots is an outlier. (4)
(b) Sketch a labelled box plot for these data and comment briefly on the skewness, in the context of gust-speed measurements. (4)
Solution with mark scheme:
(a) Step 1 — compute IQR.
IQR=Q3−Q1=31−17=14
M1 — correct IQR formula and substitution. Common slip: writing Q3+Q1 or using max minus min (range) instead.
Step 2 — compute outlier fences.
Lower fence: Q1−1.5×IQR=17−21=−4 knots. Upper fence: Q3+1.5×IQR=31+21=52 knots.
M1 — applying the 1.5×IQR rule to both quartiles.
A1 — both fences correctly stated. Examiners expect both bounds even if only one is "interesting" — quoting only the upper fence loses this A1.
Step 3 — classify 58.
Since 58>52, the maximum value is above the upper fence and is therefore classified as an outlier under the 1.5×IQR rule.
A1 — explicit comparison plus contextual statement. The phrasing examiners reward: "58 knots exceeds the upper fence of 52 knots by 6, so it is an outlier under the stated rule." Saying simply "yes" with no comparison is half a mark.
(b) Step 1 — sketch.
A correctly proportioned box plot on a horizontal scale clearly labelled "gust speed / knots", with the box from 17 to 31, median line at 23, lower whisker to 8 (the minimum, since 8>−4 so not an outlier), upper whisker terminating at the largest non-outlier value within the fences (you would state "to the largest value at most 52" — without further data assume the whisker stops at 52 and the outlier 58 is plotted as a separate point).
M1 — box drawn between Q1 and Q3 with median marked.
A1 — whiskers drawn correctly, with the outlier 58 plotted as a separate cross or dot outside the upper whisker (the standard convention).
Step 2 — skewness comment.
Compare Q3−Median=31−23=8 with Median−Q1=23−17=6. The upper half of the box is wider, so the distribution is positively skewed (right-skewed).
M1 — quantitative comparison of the two halves.
A1 — correct skew direction stated in context: "Gust speeds are positively skewed; most days have moderate gusts but a few days produce much higher values, which is physically plausible for storm-driven extremes." A naked "positively skewed" without context can lose the contextual A1.
Total: 8 marks.
Question (6 marks): A student uses the LDS to construct a histogram of daily mean temperatures for one station over one summer month. The frequency densities for five class intervals are recorded.
| Class (°C) | 10≤t<14 | 14≤t<16 | 16≤t<18 | 18≤t<22 | 22≤t<30 |
|---|---|---|---|---|---|
| Frequency density | 1.5 | 4.0 | 5.5 | 2.5 | 0.5 |
(a) Find the frequency in the class 14≤t<16. (1)
(b) Estimate the total number of days in the sample. (2)
(c) Estimate the proportion of days with mean temperature at least 17°C. (3)
Mark scheme decomposition by AO:
(a)
(b)
(c)
Total: 6 marks split AO1 = 4, AO2 = 2. This is a typical AQA balance — the procedural marks dominate, but the AO2 marks are awarded for the modelling assumption ("uniform within class") that justifies splitting an interval.
Connects to:
Sub-strand M — Sampling: the LDS is itself a sample. Any histogram, box plot or scatter diagram you draw is an estimate of the underlying distribution. Stratified or systematic sampling decisions affect what your visualisation can reliably claim. A box plot of 30 days from one station does not generalise to "UK summer gust speeds".
Sub-strand N — Summary statistics: Q1, median, Q3 underpin box plots; mean and standard deviation underpin many comparisons. Choosing between (median, IQR) and (mean, SD) as your summary pair depends on skewness — visualisations and summaries are decided together, not separately.
Sub-strand O — Probability distributions: histograms of continuous LDS variables (e.g. temperature) often look approximately Normal; this motivates fitting N(μ,σ2) and using it for probability calculations in Year 2. The histogram is the bridge between data and model.
Sub-strand P — Correlation and regression (Year 2): scatter diagrams with a fitted regression line are the bivariate analogue of the histogram. Interpreting the gradient in context ("for each 1°C rise in mean temperature, daily mean wind speed falls by about 0.3 knots, on average") is the high-tariff AO2/AO3 skill.
Sub-strand R — Hypothesis testing: any claim that "Hurn is windier than Heathrow" eventually becomes a hypothesis test on a difference of means (or a correlation). The visualisation comes first, but it must lead to a statement testable under a model.
LDS-context diagram questions split AO marks more evenly than pure-procedural topics:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.