You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Scatter graphs show the relationship between two variables. They are one of the most commonly tested statistics topics on all three Edexcel GCSE papers, with questions ranging from plotting points to interpreting lines of best fit.
| Term | Definition |
|---|---|
| Scatter graph (scatter diagram) | A graph plotting pairs of data as coordinates |
| Correlation | The relationship between two variables |
| Positive correlation | As one variable increases, the other increases |
| Negative correlation | As one variable increases, the other decreases |
| No correlation | No obvious relationship between the variables |
| Line of best fit | A straight line that best represents the trend of the data |
| Interpolation | Estimating a value within the range of the data |
| Extrapolation | Estimating a value outside the range of the data |
| Outlier | A point that does not fit the general pattern |
graph TD
A["Scatter pattern"] --> B["Positive<br/>bottom-left to top-right"]
A --> C["Negative<br/>top-left to bottom-right"]
A --> D["None<br/>no clear pattern"]
B --> E["Strong: tightly clustered<br/>Weak: more scattered"]
C --> F["Strong: tightly clustered<br/>Weak: more scattered"]
style B fill:#27ae60,color:#fff
style C fill:#e74c3c,color:#fff
style D fill:#7f8c8d,color:#fff
Positive correlation: Points go from bottom-left to top-right.
Negative correlation: Points go from top-left to bottom-right.
No correlation: Points are scattered randomly with no pattern.
Strength of correlation:
A teacher records the number of hours of revision and the test score for 8 students.
| Student | Hours (x) | Score (y) |
|---|---|---|
| A | 2 | 35 |
| B | 5 | 58 |
| C | 3 | 42 |
| D | 8 | 76 |
| E | 6 | 65 |
| F | 1 | 28 |
| G | 7 | 70 |
| H | 4 | 50 |
(a) Plot the scatter graph. (b) Describe the correlation.
Solution: (a) Draw axes: Hours on x-axis (0–10), Score on y-axis (0–80). Plot each pair of values as a cross (×). Do NOT join the points. (b) Strong positive correlation: as hours of revision increase, test scores tend to increase.
A line of best fit is a straight line drawn through the data that:
For the revision data above, calculate the mean point and describe how to draw the line of best fit.
Solution:
Using the line of best fit from Worked Example 2, estimate the score for a student who revised for 5.5 hours.
Solution: This is interpolation (5.5 is within the data range 1–8). Read up from x = 5.5 on the line: approximately 61 marks. The estimate is reliable because it is within the range of the data.
A student uses the same line to estimate the score for someone who revised for 12 hours, getting "95 marks." Comment on this estimate.
Solution: This is extrapolation — 12 hours is outside the data range (1–8 hours). The estimate is unreliable because:
Edexcel exam tip: For interpolation, the answer is usually "yes, reliable, within the range of the data." For extrapolation, the answer is "no, unreliable, outside the range — the trend may not continue."
Important: Correlation does NOT mean causation.
There is a strong positive correlation between ice cream sales and the number of drownings each month. Does ice cream cause drowning?
Solution: No. Both variables are affected by a third factor (a confounding variable): hot weather. In hot weather more people buy ice cream, AND more people swim in the sea or in pools, so more drownings occur. The correlation is real but it is not a causal link.
When writing up a scatter graph interpretation, use phrases like:
A scatter graph shows the relationship between outside temperature (°C) and the number of hot drinks sold at a café each day. The line of best fit has been drawn.
(a) Describe the correlation and interpret it in context. (b) Use the line to estimate the number of hot drinks sold when the temperature is 15°C. (The line passes through the point (15, 80).) (c) On one day the temperature was 30°C. A student uses the line of best fit to estimate sales. Comment on this estimate. (d) One point (18°C, 120 drinks) is marked. It is far above the line. Suggest a reason.
Solution:
(a) There is a negative correlation. As the temperature increases, the number of hot drinks sold tends to decrease. In context: people prefer cold drinks in warm weather.
(b) Read from the graph at x = 15: approximately 80 hot drinks. This is interpolation (reliable).
(c) Extrapolation — 30°C is outside the data range. The estimate is unreliable: the trend may not continue, sales cannot go below zero, and at very high temperatures the relationship may become non-linear.
(d) The point is an outlier. A possible reason: a sudden cold snap that day, a special event at the café, or a promotion on hot drinks that caused unusually high sales.
Two studies report correlation strengths using a correlation coefficient r (where −1≤r≤1).
Interpret each.
Solution:
(Note: calculating r from scratch is beyond GCSE, but interpreting values is part of Higher-tier analytical questions.)
An outlier is a data point that does not follow the general pattern. On a scatter graph it appears far from the line of best fit.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.