You are viewing a free preview of this lesson.
Subscribe to unlock all 13 lessons in this course and every other course on LearningBro.
Statistics is a powerful tool in geographical investigation. Rather than relying on subjective observation — "the pebbles seem to get smaller downstream" — statistical techniques allow you to measure, describe and test relationships in your data with mathematical precision. In the Edexcel B exam, you need to know how to calculate and interpret several key statistical measures, including the Spearman's rank correlation coefficient.
This lesson covers the statistical techniques required for the specification, from basic measures of central tendency through to significance testing.
Central tendency tells you the typical or average value in a dataset. There are three measures you need to know:
The mean is the arithmetic average. Add up all the values and divide by the number of values.
Formula: Mean = Sum of all values / Number of values
Example: River velocities at a site: 0.3, 0.4, 0.5, 0.4, 0.6 m/s
Mean = (0.3 + 0.4 + 0.5 + 0.4 + 0.6) / 5 = 2.2 / 5 = 0.44 m/s
Advantages: Uses all the data; gives a precise single value. Limitations: Can be distorted by extreme values (outliers). If one reading was 2.5 m/s (an error), the mean would jump to 0.84 m/s, which is unrepresentative.
The median is the middle value when all data is arranged in order from smallest to largest.
Method:
Example: Pebble sizes (mm): 12, 18, 24, 31, 45, 52, 88
Median = 31 mm (the 4th value in a set of 7)
Advantages: Not affected by extreme values; easy to find. Limitations: Does not use all the data; can be unrepresentative if data is clustered at one end.
The mode is the most frequently occurring value in a dataset.
Example: Environmental quality scores: 3, 4, 4, 4, 5, 5, 6, 7
Mode = 4 (appears 3 times)
Advantages: Shows the most common value; useful for categorical data. Limitations: There may be no mode (all values different) or multiple modes; does not use all the data.
| Measure | Best For | Limitation |
|---|---|---|
| Mean | Normally distributed data without outliers | Distorted by extreme values |
| Median | Skewed data or data with outliers | Does not use all values |
| Mode | Categorical data or identifying the most common value | May not exist or may be misleading |
Exam Tip: If asked which measure of central tendency is most appropriate, consider whether there are outliers in the data. If there are, the median is usually better than the mean because it is not distorted by extreme values. Always explain your choice.
Dispersion (or spread) tells you how spread out the data is around the central value.
The range is the simplest measure of spread.
Range = highest value - lowest value
Example: Pebble sizes: 12, 18, 24, 31, 45, 52, 88 mm
Range = 88 - 12 = 76 mm
Advantages: Very easy to calculate. Limitations: Only uses the two most extreme values, so a single outlier can make the range misleadingly large.
The interquartile range measures the spread of the middle 50% of the data, removing the influence of extreme values.
Method:
Example: Data set (already ordered): 5, 8, 12, 15, 18, 22, 25, 30, 35, 42, 48, 55
Advantages: Not affected by extreme values; more representative than the range. Limitations: More complex to calculate; ignores data outside the middle 50%.
| Measure | Uses | Sensitivity to Outliers |
|---|---|---|
| Range | Quick overview of spread | Very sensitive — one outlier changes it dramatically |
| IQR | Robust measure of spread | Not sensitive — ignores the top and bottom 25% |
The Spearman's rank correlation coefficient (rs) is the most important statistical test for GCSE Geography. It measures the strength and direction of a relationship between two variables using ranked data.
Use Spearman's rank when:
rs = 1 - (6 x Σd²) / (n³ - n)
Where:
Example: Testing whether pebble size decreases with distance downstream
| Site | Distance downstream (km) | Mean pebble size (mm) | Rank distance | Rank pebble | d | d² |
|---|---|---|---|---|---|---|
| A | 0.5 | 85 | 1 | 8 | -7 | 49 |
| B | 1.0 | 72 | 2 | 7 | -5 | 25 |
| C | 1.5 | 58 | 3 | 6 | -3 | 9 |
| D | 2.0 | 45 | 4 | 5 | -1 | 1 |
| E | 2.5 | 38 | 5 | 4 | 1 | 1 |
| F | 3.0 | 30 | 6 | 3 | 3 | 9 |
| G | 3.5 | 25 | 7 | 2 | 5 | 25 |
| H | 4.0 | 18 | 8 | 1 | 7 | 49 |
Step 1: Rank both variables (1 = smallest/lowest). If values are tied, give each the average rank they would have occupied.
Step 2: Calculate d (difference between ranks) for each pair.
Step 3: Square each d value to get d².
Step 4: Sum all d² values: Σd² = 49 + 25 + 9 + 1 + 1 + 9 + 25 + 49 = 168
Step 5: Substitute into the formula:
rs = 1 - (6 x 168) / (8³ - 8) = 1 - 1008 / 504 = 1 - 2 = -1.0
The value of rs always falls between -1 and +1:
| rs Value | Interpretation |
|---|---|
| +1.0 | Perfect positive correlation |
| +0.7 to +0.99 | Strong positive correlation |
| +0.4 to +0.69 | Moderate positive correlation |
| +0.1 to +0.39 | Weak positive correlation |
| 0 | No correlation |
| -0.1 to -0.39 | Weak negative correlation |
| -0.4 to -0.69 | Moderate negative correlation |
| -0.7 to -0.99 | Strong negative correlation |
| -1.0 | Perfect negative correlation |
In our example, rs = -1.0, indicating a perfect negative correlation — pebble size decreases consistently as distance downstream increases. This supports the hypothesis that pebbles get smaller downstream due to attrition and abrasion.
Exam Tip: You must be able to carry out a Spearman's rank calculation from start to finish in the exam. Practise the steps until you can do them quickly and accurately. The most common errors are: (1) ranking the wrong way (always check whether 1 = lowest or highest — be consistent), (2) forgetting to square the d values, and (3) arithmetic mistakes. Use a calculator and double-check each step.
Once you have calculated rs, you need to determine whether your result is statistically significant — that is, whether the correlation is likely to be real or could have occurred by chance.
Subscribe to continue reading
Get full access to this lesson and all 13 lessons in this course.