You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Correlation and regression are tools for exploring and modelling the relationship between two (or more) quantitative variables. Correlation measures the strength and direction of a linear relationship, while regression provides a predictive equation.
The first step in studying the relationship between two variables is to create a scatter plot. Each point represents one observation plotted as (x, y).
Patterns to look for:
The Pearson correlation coefficient measures the strength and direction of a linear relationship:
r = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / √[Σ(xᵢ − x̄)² × Σ(yᵢ − ȳ)²]
| Value of r | Interpretation |
|---|---|
| r = +1 | Perfect positive linear relationship |
| 0.7 ≤ r < 1 | Strong positive |
| 0.3 ≤ r < 0.7 | Moderate positive |
| 0 < r < 0.3 | Weak positive |
| r = 0 | No linear relationship |
| −1 ≤ r < 0 | Negative (same scale, reversed) |
| r = −1 | Perfect negative linear relationship |
Warning: r measures only linear relationships. Two variables can have a strong non-linear relationship with r near 0.
A strong correlation between X and Y does not mean X causes Y. Possible explanations include:
Regression fits a line of best fit to the data, allowing prediction.
ŷ = b₀ + b₁x
Where:
The line is chosen to minimise the sum of squared residuals:
SSE = Σ(yᵢ − ŷᵢ)²
The slope and intercept formulas:
b₁ = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / Σ(xᵢ − x̄)²
b₀ = ȳ − b₁x̄
R² measures the proportion of variance in Y explained by X:
R² = 1 − (SSE / SST)
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.