Statistical Techniques

Statistics is a powerful tool in geographical investigation. Rather than relying on subjective observation — "the pebbles seem to get smaller downstream" — statistical techniques allow you to measure, describe and test relationships in your data with mathematical precision. In the Edexcel B exam, you need to know how to calculate and interpret several key statistical measures, including the Spearman's rank correlation coefficient.

This lesson covers the statistical techniques required for the specification, from basic measures of central tendency through to significance testing.

Measures of Central Tendency

Central tendency tells you the typical or average value in a dataset. There are three measures you need to know:

Mean

The mean is the arithmetic average. Add up all the values and divide by the number of values.

Formula: Mean = Sum of all values / Number of values

Example: River velocities at a site: 0.3, 0.4, 0.5, 0.4, 0.6 m/s

Mean = (0.3 + 0.4 + 0.5 + 0.4 + 0.6) / 5 = 2.2 / 5 = 0.44 m/s

Advantages: Uses all the data; gives a precise single value. Limitations: Can be distorted by extreme values (outliers). If one reading was 2.5 m/s (an error), the mean would jump to 0.84 m/s, which is unrepresentative.

Median

The median is the middle value when all data is arranged in order from smallest to largest.

Method:

Arrange values in ascending order
If there is an odd number of values, the median is the middle value
If there is an even number of values, the median is the mean of the two middle values

Example: Pebble sizes (mm): 12, 18, 24, 31, 45, 52, 88

Median = 31 mm (the 4th value in a set of 7)

Advantages: Not affected by extreme values; easy to find. Limitations: Does not use all the data; can be unrepresentative if data is clustered at one end.

Mode

The mode is the most frequently occurring value in a dataset.

Example: Environmental quality scores: 3, 4, 4, 4, 5, 5, 6, 7

Mode = 4 (appears 3 times)

Advantages: Shows the most common value; useful for categorical data. Limitations: There may be no mode (all values different) or multiple modes; does not use all the data.

Comparing Central Tendency Measures

Measure	Best For	Limitation
Mean	Normally distributed data without outliers	Distorted by extreme values
Median	Skewed data or data with outliers	Does not use all values
Mode	Categorical data or identifying the most common value	May not exist or may be misleading

Exam Tip: If asked which measure of central tendency is most appropriate, consider whether there are outliers in the data. If there are, the median is usually better than the mean because it is not distorted by extreme values. Always explain your choice.

Measures of Dispersion

Dispersion (or spread) tells you how spread out the data is around the central value.

Range

The range is the simplest measure of spread.

Range = highest value - lowest value

Example: Pebble sizes: 12, 18, 24, 31, 45, 52, 88 mm

Range = 88 - 12 = 76 mm

Advantages: Very easy to calculate. Limitations: Only uses the two most extreme values, so a single outlier can make the range misleadingly large.

Interquartile Range (IQR)

The interquartile range measures the spread of the middle 50% of the data, removing the influence of extreme values.

Method:

Arrange data in ascending order
Find the lower quartile (Q1) — the median of the lower half of the data
Find the upper quartile (Q3) — the median of the upper half of the data
IQR = Q3 - Q1

Example: Data set (already ordered): 5, 8, 12, 15, 18, 22, 25, 30, 35, 42, 48, 55

Q1 = median of lower half (5, 8, 12, 15, 18, 22) = (12 + 15) / 2 = 13.5
Q3 = median of upper half (25, 30, 35, 42, 48, 55) = (35 + 42) / 2 = 38.5
IQR = 38.5 - 13.5 = 25

Advantages: Not affected by extreme values; more representative than the range. Limitations: More complex to calculate; ignores data outside the middle 50%.

Measure	Uses	Sensitivity to Outliers
Range	Quick overview of spread	Very sensitive — one outlier changes it dramatically
IQR	Robust measure of spread	Not sensitive — ignores the top and bottom 25%

Spearman's Rank Correlation Coefficient

The Spearman's rank correlation coefficient (rs) is the most important statistical test for GCSE Geography. It measures the strength and direction of a relationship between two variables using ranked data.

When to Use Spearman's Rank

Use Spearman's rank when:

You have paired data (two variables measured at the same locations or for the same items)
You want to test whether there is a correlation between them
You have at least 10 pairs of data (ideally more)

The Formula

rs = 1 - (6 x Σd²) / (n³ - n)

Where:

rs = Spearman's rank correlation coefficient
d = the difference between the ranks of each pair
Σd² = the sum of all the squared differences
n = the number of data pairs

Step-by-Step Calculation

Example: Testing whether pebble size decreases with distance downstream

Site	Distance downstream (km)	Mean pebble size (mm)	Rank distance	Rank pebble	d	d²
A	0.5	85	1	8	-7	49
B	1.0	72	2	7	-5	25
C	1.5	58	3	6	-3	9
D	2.0	45	4	5	-1	1
E	2.5	38	5	4	1	1
F	3.0	30	6	3	3	9
G	3.5	25	7	2	5	25
H	4.0	18	8	1	7	49

Step 1: Rank both variables (1 = smallest/lowest). If values are tied, give each the average rank they would have occupied.

Step 2: Calculate d (difference between ranks) for each pair.

Step 3: Square each d value to get d².

Step 4: Sum all d² values: Σd² = 49 + 25 + 9 + 1 + 1 + 9 + 25 + 49 = 168

Step 5: Substitute into the formula:

rs = 1 - (6 x 168) / (8³ - 8) = 1 - 1008 / 504 = 1 - 2 = -1.0

Interpreting the Result

The value of rs always falls between -1 and +1:

rs Value	Interpretation
+1.0	Perfect positive correlation
+0.7 to +0.99	Strong positive correlation
+0.4 to +0.69	Moderate positive correlation
+0.1 to +0.39	Weak positive correlation
0	No correlation
-0.1 to -0.39	Weak negative correlation
-0.4 to -0.69	Moderate negative correlation
-0.7 to -0.99	Strong negative correlation
-1.0	Perfect negative correlation

In our example, rs = -1.0, indicating a perfect negative correlation — pebble size decreases consistently as distance downstream increases. This supports the hypothesis that pebbles get smaller downstream due to attrition and abrasion.

Exam Tip: You must be able to carry out a Spearman's rank calculation from start to finish in the exam. Practise the steps until you can do them quickly and accurately. The most common errors are: (1) ranking the wrong way (always check whether 1 = lowest or highest — be consistent), (2) forgetting to square the d values, and (3) arithmetic mistakes. Use a calculator and double-check each step.

Significance Testing

Once you have calculated rs, you need to determine whether your result is statistically significant — that is, whether the correlation is likely to be real or could have occurred by chance.

Statistical Techniques

Statistical Techniques

Measures of Central Tendency

Mean

Median

Mode

Comparing Central Tendency Measures

Measures of Dispersion

Range

Interquartile Range (IQR)

Spearman's Rank Correlation Coefficient

When to Use Spearman's Rank

The Formula

Step-by-Step Calculation

Interpreting the Result

Significance Testing

Using a Significance Table

More in Geography

Site	Distance downstream (km)	Mean pebble size (mm)	Rank distance	Rank pebble	d	d²
A	0.5	85	1	8	-7	49
B	1.0	72	2	7	-5	25
C	1.5	58	3	6	-3	9
D	2.0	45	4	5	-1	1
E	2.5	38	5	4	1	1
F	3.0	30	6	3	3	9
G	3.5	25	7	2	5	25
H	4.0	18	8	1	7	49

Site	Distance downstream (km)	Mean pebble size (mm)	Rank distance	Rank pebble	d	d²
A	0.5	85	1	8	-7	49
B	1.0	72	2	7	-5	25
C	1.5	58	3	6	-3	9
D	2.0	45	4	5	-1	1
E	2.5	38	5	4	1	1
F	3.0	30	6	3	3	9
G	3.5	25	7	2	5	25
H	4.0	18	8	1	7	49

Site	Distance downstream (km)	Mean pebble size (mm)	Rank distance	Rank pebble	d	d²
A	0.5	85	1	8	-7	49
B	1.0	72	2	7	-5	25
C	1.5	58	3	6	-3	9
D	2.0	45	4	5	-1	1
E	2.5	38	5	4	1	1
F	3.0	30	6	3	3	9
G	3.5	25	7	2	5	25
H	4.0	18	8	1	7	49