You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
A scatter graph plots pairs of values to reveal whether two quantities are related — for example, whether students who revise more tend to score higher. If a relationship exists, we describe it as correlation, draw a line of best fit, and use that line to make predictions. The OCR GCSE Mathematics (J560) Statistics strand expects you to plot points, describe the type and strength of correlation, draw a sensible line of best fit, and use it to estimate values — while knowing the difference between reliable interpolation and risky extrapolation.
This topic combines AO1 (plotting points and drawing a line of best fit) with AO2 (describing correlation and the relationship in context) and AO3 (judging the reliability of a prediction). OCR command words you will meet include "Plot", "Draw", "Describe", "Estimate" and "Give a reason for your answer". A common trap is confusing correlation with cause, so be ready to comment carefully.
| Term | Meaning |
|---|---|
| Scatter graph | A plot of paired data, one variable on each axis |
| Correlation | A relationship between two variables shown by the trend of the points |
| Positive correlation | As one variable increases, so does the other |
| Negative correlation | As one variable increases, the other decreases |
| No correlation | No clear relationship between the variables |
| Line of best fit | A straight line drawn through the trend of the data |
| Interpolation | Estimating within the range of the data (reliable) |
| Extrapolation | Estimating beyond the range of the data (unreliable) |
| Outlier | A point that does not fit the general trend |
Correlation describes how the points are arranged.
We also judge strength: if the points lie close to a straight line the correlation is strong; if they are loosely spread the correlation is weak.
The scatter graph below shows the revision time (hours) and test score (%) for ten students. The points rise to the right, showing positive correlation, with one student (3 hours, 20%) lying away from the trend.
Describe the correlation shown by each scenario: (a) the age of a car and its value; (b) a person's height and their house number; (c) the temperature outside and sales of ice cream.
Solution: (a) Negative correlation — older cars are worth less. (b) No correlation — these are unrelated. (c) Positive correlation — hotter weather goes with higher ice-cream sales.
A scatter graph of "hours of sunshine" against "rainfall" shows points falling from top-left to bottom-right, lying fairly close to a line. Describe the correlation fully.
Solution: It is a strong negative correlation: as hours of sunshine increase, rainfall tends to decrease, and the points lie close to a straight line so the relationship is strong.
Common error: writing only "negative" without commenting on strength, or describing the trend without putting it in context.
A line of best fit is a single straight line that follows the trend of the points. Guidelines:
Once drawn, the line lets you estimate a value: go up from a known x to the line, then across to read the y (or the reverse).
Using the revision scatter graph, a student revised for 5 hours. Use the line of best fit to estimate their score, and state whether this is interpolation or extrapolation.
Solution: Reading up from 5 hours to the line and across gives a score of about 63% (your reading should be close to this). Because 5 hours lies within the data range (1–9 hours), this is interpolation, which is reliable.
For the same graph, why would estimating the score for 15 hours of revision be unreliable?
Solution: 15 hours is beyond the range of the data (the most anyone revised was 9 hours). Predicting there is extrapolation, and the trend may not continue — scores cannot exceed 100% and may level off — so the estimate is unreliable.
A correlation shows the two variables move together; it does not prove that one causes the other. There may be a third factor, or the link may be coincidence.
Ice-cream sales and the number of people with sunburn are positively correlated. Does eating ice cream cause sunburn? Explain.
Solution: No. Both rise in hot, sunny weather, so the weather is the common cause. The correlation between ice cream and sunburn is real but neither causes the other — a classic example of correlation without causation.
A report notes a positive correlation between the number of firefighters sent to a fire and the amount of damage caused. Explain why "sending more firefighters causes more damage" is a flawed conclusion.
Solution: Bigger fires both cause more damage and require more firefighters, so the size of the fire is the underlying cause. The firefighters do not cause the damage; the conclusion confuses correlation with causation.
Once a line of best fit is drawn, you can use it in two directions: given an x, read up to the line and across for y; given a y, read across to the line and down for x. You can also describe the line's steepness to talk about the rate of change — how much y changes for each unit of x. This is exactly the gradient idea from straight-line graphs, applied to data.
A line of best fit for "weekly hours of part-time work" (x) against "weekly study hours" (y) passes through the points (0, 30) and (20, 10). (a) For each extra hour of work, by how much do study hours change on average? (b) Estimate the study hours for a student who works 12 hours.
Solution: (a) Study hours fall from 30 to 10 as work rises from 0 to 20, a change of −20 study hours over 20 work hours, i.e. −1 study hour per work hour. So each extra hour of work is associated with about one fewer hour of study. (b) Starting from 30 at x=0 and falling 1 per work hour: 30−12×1=18 study hours. As 12 is within the data range, this is reliable interpolation.
The line of best fit on a scatter of "temperature (°C)" against "units of gas used" passes through (4, 90) and (16, 30). Estimate the gas used at 10 °C.
Solution: Gas falls from 90 to 30 as temperature rises from 4 to 16 — a drop of 60 units over 12 °C, i.e. 5 units per °C. At 10 °C (which is 6 °C above 4 °C): 90−6×5=90−30=60 units. The negative trend matches common sense: warmer weather means less heating.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.