You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Correlation asks how strongly two variables move together; regression asks what line best summarises that relationship and lets us predict. This lesson takes both beyond GCSE/AS: it derives and interprets Pearson's product-moment correlation coefficient (PMCC) r, introduces Spearman's rank correlation coefficient rs for monotonic (not necessarily linear) association, tests each against population hypotheses (ρ=0) using critical-value tables, and builds the least-squares regression line from the bivariate sums, with full attention to when prediction is valid.
This is Paper 3 optional content — Statistics (7367/3S), chosen alongside Mechanics (7367/3M) or Discrete (7367/3D). Paper 3 is 2 hours, 100 marks, AO1 40% / AO2 25% / AO3 35%. The mechanics of computing r, rs and the regression coefficients are AO1; choosing the right coefficient for the data, interpreting r2, and warning against extrapolation are AO2; a multi-step worded test is AO3. It builds on A-Level Maths bivariate data (scatter diagrams, the PMCC, the y-on-x regression line) and on the hypothesis-testing framework from earlier in this option.
For paired data (xi,yi), define the bivariate sums
Sxx=∑(xi−xˉ)2=∑xi2−n(∑xi)2,Syy=∑(yi−yˉ)2=∑yi2−n(∑yi)2, Sxy=∑(xi−xˉ)(yi−yˉ)=∑xiyi−n(∑xi)(∑yi).
The right-hand "computational" forms are the ones to use in practice — they avoid subtracting the mean from every value. The PMCC is
r=SxxSyySxy,−1≤r≤1.
It measures the strength and direction of a linear relationship: it is the covariance of x and y divided by the product of their standard deviations, so it is dimensionless and unchanged by any linear rescaling of either variable (changing units from cm to m leaves r fixed). The bound ∣r∣≤1 is the Cauchy–Schwarz inequality applied to the centred data.
Geometrically, r is the cosine of the angle between the two centred data vectors (xi−xˉ) and (yi−yˉ): when they point the same way r=1, when opposite r=−1, and when orthogonal r=0. This is why r detects only the linear component of a relationship — it is blind to any structure perpendicular to a straight-line trend. A small ∣r∣ therefore rules out a linear association but says nothing about curved or other non-linear patterns, a point we return to under misconceptions.
| Value of r | Interpretation |
|---|---|
| r=1 | perfect positive linear correlation (all points on a line of positive gradient) |
| r=−1 | perfect negative linear correlation |
| r=0 | no linear correlation (but a curved relationship may still exist) |
Five pairs of readings give ∑x=30, ∑y=45, ∑x2=220, ∑y2=491, ∑xy=328, with n=5. Find r, and the equation of the regression line of y on x.
Bivariate sums.
Sxx=220−5302=220−180=40,Syy=491−5452=491−405=86.(M1; A1) Sxy=328−530×45=328−270=58.(A1)
PMCC.
r=40×8658=344058=58.6558=0.989.(M1 formula; A1 to 3 s.f.)
Regression line. With xˉ=30/5=6 and yˉ=45/5=9,
b=SxxSxy=4058=1.45,a=yˉ−bxˉ=9−1.45(6)=0.3.(M1 b; A1 a) ∴ y=0.3+1.45xequivalentlyy−9=1.45(x−6).(A1)
(M1/A1 for the sums; M1/A1 for r; M1/A1/A1 for b, a and the equation. The very high r=0.989 signals a strong positive linear relationship, consistent with the steep positive gradient b=1.45.)
To see where those summary statistics come from, consider the raw pairs (2,3),(4,7),(6,8),(8,12),(10,15). Tabulating the products:
| x | y | x2 | y2 | xy |
|---|---|---|---|---|
| 2 | 3 | 4 | 9 | 6 |
| 4 | 7 | 16 | 49 | 28 |
| 6 | 8 | 36 | 64 | 48 |
| 8 | 12 | 64 | 144 | 96 |
| 10 | 15 | 100 | 225 | 150 |
| 30 | 45 | 220 | 491 | 328 |
The column totals are exactly the summary statistics used in Worked Example 1: ∑x=30, ∑y=45, ∑x2=220, ∑y2=491, ∑xy=328. In an exam you would build this table first, then substitute into the Sxx,Syy,Sxy formulae — laying out the table earns the AO1 method marks even if a single arithmetic slip costs an accuracy mark. Always keep the totals to full accuracy; round only the final r.
A sample r is only an estimate of the population correlation ρ. To test whether there is genuine linear correlation in the population:
H0:ρ=0 (no linear correlation),H1:ρ=0 (two-tailed) or ρ>0, ρ<0 (one-tailed).
Compare the sample r with the PMCC critical value read from tables for the given n and significance level. Reject H0 if ∣r∣ exceeds the critical value (two-tailed) or if r is beyond the one-tailed critical value in the stated direction.
The choice of tail must come from the context, set before seeing the data. If the question asks whether there is any association ("is there correlation?"), use a two-tailed test (H1:ρ=0); if it predicts a direction ("do taller people weigh more?"), use a one-tailed test (H1:ρ>0 or <0). The one-tailed critical value is smaller (easier to reach) because the whole significance level sits in one tail, so choosing the tail after seeing the data would inflate the true Type I error rate — exactly the malpractice flagged in the hypothesis-testing lesson. As ever in a "test", the population parameter ρ appears in the hypotheses; the sample r is only the evidence weighed against the table.
For n=10 pairs the sample PMCC is r=0.65. Test at the 5% level whether there is positive correlation in the population.
H0:ρ=0,H1:ρ>0 (one-tailed).(B1 hypotheses in terms of ρ)
From PMCC tables, the one-tailed 5% critical value for n=10 is 0.5494.
r=0.65>0.5494=critical value⇒reject H0.(M1 compare; A1 reject)
There is evidence at the 5% level of positive linear correlation in the population. (B1 for hypotheses stated with the population parameter ρ — not r; M1 for the correct one-tailed comparison; A1 for a contextual conclusion. A two-tailed test would compare against 0.6319; 0.65>0.6319 still rejects.)
Spearman's coefficient rs measures monotonic association — whether y tends to increase (or decrease) as x increases, even if the trend is curved. It is simply the PMCC computed on the ranks of the data. When there are no ties, this reduces to the convenient formula
rs=1−n(n2−1)6∑di2,di=rank(xi)−rank(yi).
Procedure: rank each variable separately (rank 1 = smallest, say); for tied values assign the average of the positions they share; compute each di and di2; then apply the formula.
Seven products are ranked by two judges. The data and ranks:
| Pair | x | y | Rank x | Rank y | d | d2 |
|---|---|---|---|---|---|---|
| 1 | 56 | 44 | 6 | 6 | 0 | 0 |
| 2 | 75 | 70 | 2 | 2 | 0 | 0 |
| 3 | 45 | 52 | 7 | 5 | 2 | 4 |
| 4 | 71 | 58 | 3 | 4 | -1 | 1 |
| 5 | 62 | 67 | 4 | 3 | 1 | 1 |
| 6 | 80 | 82 | 1 | 1 | 0 | 0 |
| 7 | 58 | 41 | 5 | 7 | -2 | 4 |
∑d2=0+0+4+1+1+0+4=10.(M1 ranking; A1 ∑d2) rs=1−7(72−1)6×10=1−33660=1−0.1786=0.821.(M1 formula; A1)
Test H0: no association, H1: positive association, at 5%. The one-tailed critical value (n=7) is 0.7143:
rs=0.821>0.7143⇒reject H0; evidence of positive monotonic association.(A1 conclusion)
(M1 for ranking both variables consistently; A1 for ∑d2=10; M1/A1 for the coefficient; A1 for a contextual conclusion against the table value. A common slip is ranking x ascending but y descending — always rank both the same way.)
| Feature | Pearson's r | Spearman's rs |
|---|---|---|
| Measures | linear correlation | monotonic correlation |
| Data type | continuous (ideally bivariate normal) | ordinal, or non-normal continuous |
| Sensitivity to outliers | high | low (uses ranks) |
| Curved monotonic trend | underestimates strength | captures it (can be ±1) |
| Tied values | not an issue | need average ranks |
Exam Tip: Choose Spearman's when the data are already ranks, when the relationship is monotonic but visibly non-linear, or when an outlier would distort Pearson's r. Choose Pearson's when a linear model is appropriate and you also want the regression line.
When two or more values are equal, assign each the average of the ranks they would have occupied. For example the data 10,15,15,20 receive ranks 1,2.5,2.5,4 (the two 15s share positions 2 and 3, averaging to 2.5). With ties present, the shortcut rs=1−6∑d2/(n(n2−1)) is only an approximation; for accuracy with several ties, compute the PMCC of the ranks directly using the Sxy/SxxSyy formula.
For the linear model, r2 (the square of Pearson's r) is the proportion of the variation in y explained by the linear relationship with x:
r2=total variationexplained variation=1−Syy∑(yi−y^i)2.
For Worked Example 1, r=0.989 gives r2=0.978: about 97.8% of the variation in y is explained by the linear fit — an excellent model. A value r=0.8 gives r2=0.64, i.e. 64% explained, leaving 36% to other factors or noise.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.