Measures of Location and Spread

This lesson covers measures of central tendency (location) and measures of spread (dispersion) for the Edexcel A-Level Mathematics specification (9MA0), Paper 3 Section A -- Statistics. You must know how to calculate the mean, median, mode, range, IQR, variance, and standard deviation from raw data and from frequency tables, including how to work with coded data.

Measures of Location (Central Tendency)

Mean

The mean (or arithmetic mean) is the sum of all the values divided by the number of values.

For raw data with n values x1, x2, ..., xn:

Mean (x-bar) = (x1 + x2 + ... + xn) / n

Using sigma notation: x-bar = (1/n) x Sigma(xi)

Mean from a frequency table

When data is presented in a frequency table:

Mean = Sigma(f x x) / Sigma(f)

where f is the frequency and x is the data value (or the midpoint of the class for grouped data).

Example (grouped data)

Height (cm)	Frequency (f)	Midpoint (x)	f x x
150 ≤ h < 160	5	155	775
160 ≤ h < 170	12	165	1980
170 ≤ h < 180	18	175	3150
180 ≤ h < 190	10	185	1850
190 ≤ h < 200	5	195	975
Total	50		8730

Mean = 8730 / 50 = 174.6 cm

Exam Tip: For grouped data, the mean is an estimate because we use class midpoints rather than the actual data values.

Median

The median is the middle value when all values are arranged in order of size.

If n is odd: the median is the ((n + 1) / 2)-th value.
If n is even: the median is the average of the (n/2)-th and (n/2 + 1)-th values.

Median from a frequency table

For grouped data, the median class is the class containing the (n/2)-th value. Use linear interpolation to find an estimate of the median within that class:

Median = L + ((n/2 - F) / f) x w

where L = lower class boundary, F = cumulative frequency before the median class, f = frequency of the median class, w = class width.

Mode

The mode is the value (or class) that occurs most frequently. A data set can be:

Unimodal -- one mode.
Bimodal -- two modes.
Multimodal -- more than two modes.
No mode -- all values occur with equal frequency.

For grouped data, the modal class is the class with the highest frequency.

Measures of Spread (Dispersion)

Range

Range = Maximum value - Minimum value

The range is simple to calculate but is heavily influenced by extreme values (outliers).

Interquartile Range (IQR)

IQR = Q3 - Q1

The IQR measures the spread of the middle 50% of the data. It is not affected by extreme values, making it a more robust measure of spread than the range.

Percentiles

The p-th percentile is the value below which p% of the data falls.

Q1 = 25th percentile
Q2 (median) = 50th percentile
Q3 = 75th percentile

Inter-decile range

The 10th to 90th percentile range = P90 - P10. This captures the central 80% of the data.

Variance and Standard Deviation

Variance

The variance measures the average squared deviation from the mean.

For a data set of n values, the population variance is:

Variance = Sigma((xi - x-bar)²) / n

There is an equivalent (and usually easier to use) formula:

Variance = (Sigma(xi²) / n) - (Sigma(xi) / n)²

This is often written as:

Variance = Sigma(x²)/n - (x-bar)² = mean of squares - square of mean

You can also write this using Sxx notation:

Sxx = Sigma(xi²) - (Sigma(xi))²/n = Sigma((xi - x-bar)²)

Then Variance = Sxx / n and Standard deviation = sqrt(Sxx / n).

Standard Deviation

The standard deviation is the square root of the variance:

Standard deviation = sqrt(Variance)

Calculating variance from raw data

Given data values: 3, 5, 7, 8, 12

Step 1: Find the mean. x-bar = (3 + 5 + 7 + 8 + 12) / 5 = 35 / 5 = 7

Step 2: Find Sigma(xi²). 3² + 5² + 7² + 8² + 12² = 9 + 25 + 49 + 64 + 144 = 291

Step 3: Apply the formula. Variance = 291/5 - 7² = 58.2 - 49 = 9.2

Step 4: Standard deviation = sqrt(9.2) = 3.033 (to 3 d.p.)

Calculating variance from a frequency table

When data is in a frequency table, use:

Variance = (Sigma(f x x²) / Sigma(f)) - (Sigma(f x x) / Sigma(f))²

Example

Score (x)	Frequency (f)	f x x	f x x²
1	3	3	3
2	7	14	28
3	12	36	108
4	8	32	128
5	5	25	125
Total	35	110	392

Mean = 110 / 35 = 3.143 (to 3 d.p.)

Variance = 392/35 - (110/35)² = 11.2 - 9.878 = 1.322 (to 3 d.p.)

Standard deviation = sqrt(1.322) = 1.150 (to 3 d.p.)

Interpreting Variance and Standard Deviation

A large standard deviation means the data is widely spread from the mean -- there is high variability.
A small standard deviation means the data is clustered closely around the mean -- there is low variability.
The standard deviation is in the same units as the data, whereas the variance is in squared units. This makes the standard deviation easier to interpret.

Exam Tip: When comparing two data sets, a smaller standard deviation means the data is more consistent. Always comment in context: "The standard deviation of reaction times for Group A (0.12 s) is lower than for Group B (0.25 s), indicating Group A's reaction times are more consistent."

Coded Data (Linear Coding)

Coding (or linear transformation) simplifies calculations by transforming data values before calculating summary statistics.

If the coding is: y = (x - a) / b

Then:

Mean of x = a + b x (mean of y)
Standard deviation of x = b x (standard deviation of y)
Variance of x = b² x (variance of y)

Why coding works

Subtracting a constant (a) shifts all values but does not change the spread. So the standard deviation is unchanged by subtraction alone.
Dividing by a constant (b) scales all values, which scales the standard deviation by the same factor.

Example

Coded data: y = (x - 50) / 10

Given: mean of y = 2.4, standard deviation of y = 1.3

Mean of x = 50 + 10 x 2.4 = 50 + 24 = 74
Standard deviation of x = 10 x 1.3 = 13
Variance of x = 10² x 1.3² = 100 x 1.69 = 169

Exam Tip: Coding questions are very common. Remember: adding/subtracting changes the mean but NOT the standard deviation. Multiplying/dividing changes BOTH the mean and the standard deviation.

Choosing Appropriate Measures

Situation	Best measure of location	Best measure of spread
Symmetrical data, no outliers	Mean	Standard deviation
Skewed data or outliers present	Median	IQR
Categorical data (most common value)	Mode	Not applicable

The mean and standard deviation use all data values, making them sensitive to extreme values.
The median and IQR are resistant to outliers, making them more suitable for skewed data.

Summary

Mean = sum of values / number of values. For grouped data, use midpoints (this gives an estimate).
Median = middle value when data is in order. For grouped data, use interpolation.
Mode = most frequent value or class.
Range = max - min (affected by outliers).
IQR = Q3 - Q1 (not affected by outliers).
Variance = Sxx/n = (Sigma(x²)/n) - (x-bar)². Standard deviation = sqrt(variance).
Coding: if y = (x - a)/b, then mean of x = a + b(mean of y) and SD of x = |b| x (SD of y).
Choose the mean and SD for symmetrical data; choose the median and IQR for skewed data or data with outliers.

A-Level Deep Dive: Measures of Location and Spread

Spec mapping

Edexcel 9MA0-03 specification, Paper 3 — Statistics and Mechanics, Section 3 (Measures of location and spread) covers interpret measures of central tendency and variation, extending to standard deviation. Be able to calculate standard deviation, including from summary statistics. Use coding to find the mean and standard deviation of a data set (refer to the official specification document for exact wording). Examined directly in 9MA0-03 (typically Q1–Q3 alongside data presentation), and implicitly throughout Section 4 (Probability and statistical distributions: the normal model is parametrised by $\mu$ and $\sigma$ ) and Section 6 (Statistical hypothesis testing: test statistics involve sample means and standard errors). The Edexcel formula booklet supplies $\sigma^2 = \frac{\sum f x^2}{\sum f} - \bar{x}^2$ ; coding rules are not listed and must be recalled.

Worked example with full mark scheme

Question (8 marks): A garden centre records the heights, $h$ cm, of 50 saplings. The grouped data are summarised below.

Height $h$ (cm)	Frequency $f$
$60 \le h < 80$	6
$80 \le h < 100$	14
$100 \le h < 120$	18
$120 \le h < 140$	9
$140 \le h < 160$	3

(a) Using the coding $y = \dfrac{x - 110}{20}$ , where $x$ is the mid-interval height, find $\bar{y}$ and $s_y$ , the sample standard deviation of $y$ . (5)

(b) Hence find $\bar{x}$ and $s_x$ for the heights. (3)

Solution with mark scheme:

(a) Step 1 — mid-interval values and coded values. Mid-points $x$ : 70, 90, 110, 130, 150. Coded $y = (x - 110)/20$ : $-2, -1, 0, 1, 2$ .

M1 — correct mid-interval values and the coded $y$ values aligned to each class. Common error: students place the coding origin at the lowest mid-point (70) rather than the chosen 110 — coding is still valid but arithmetic gets messier and slips multiply.

Step 2 — compute $\sum fy$ and $\sum fy^2$ .

$\sum fy = 6(-2) + 14(-1) + 18(0) + 9(1) + 3(2) = -12 - 14 + 0 + 9 + 6 = -11$ $\sum fy^2 = 6(4) + 14(1) + 18(0) + 9(1) + 3(4) = 24 + 14 + 0 + 9 + 12 = 59$

M1 — at least one of $\sum fy$ or $\sum fy^2$ correct.

A1 — both correct.

Step 3 — coded mean and variance. With $n = \sum f = 50$ :

$\bar{y} = \dfrac{\sum fy}{n} = \dfrac{-11}{50} = -0.22$ $s_y^2 = \dfrac{\sum fy^2}{n} - \bar{y}^2 = \dfrac{59}{50} - 0.0484 = 1.18 - 0.0484 = 1.1316$ $s_y = \sqrt{1.1316} \approx 1.0638$

M1 — correct application of the variance formula $\sigma^2 = \frac{\sum fy^2}{n} - \bar{y}^2$ .

A1 — $\bar{y} = -0.22$ and $s_y \approx 1.064$ (3 s.f. acceptable).

(b) Step 4 — decode the mean. Since $y = (x - 110)/20$ , equivalently $x = 20y + 110$ . By the linear-coding rules:

$\bar{x} = 20\bar{y} + 110 = 20(-0.22) + 110 = -4.4 + 110 = 105.6 \text{ cm}$

M1 (AO1.1b) — correct decoding of the mean.

Step 5 — decode the standard deviation. Standard deviation scales by $|a|$ and is unaffected by the additive constant:

$s_x = 20 \cdot s_y = 20 \times 1.0638 \approx 21.28 \text{ cm}$

A1 (AO1.1b) — correct $s_x$ .

A1 (AO2.5) — final answers presented with units (cm) and to a sensible number of significant figures, with the additive 110 explicitly not applied to the standard deviation.

Total: 8 marks (M3 A4 plus A1 reasoning, split as shown).

Specimen question modelled on the Edexcel 9MA0 Paper 3 format

Question (6 marks): The times, $t$ minutes, taken by 30 students to complete a problem set have summary statistics $\sum t = 1410$ and $\sum t^2 = 68,820$ .

(a) Find the mean and standard deviation of $t$ . (3)

(b) An additional student completes the problem set in 55 minutes and is added to the data set. Without recomputing from raw data, find the new mean. State, with a reason, whether the new standard deviation is larger or smaller than the original. (3)

Mark scheme decomposition by AO:

(a)

M1 (AO1.1a) — $\bar{t} = 1410/30 = 47$ .
M1 (AO1.1b) — $\sigma_t^2 = 68820/30 - 47^2 = 2294 - 2209 = 85$ .
A1 (AO1.1b) — $\sigma_t = \sqrt{85} \approx 9.22$ minutes.

(b)

M1 (AO1.1b) — new $\sum t = 1410 + 55 = 1465$ , new $n = 31$ , so new $\bar{t} = 1465/31 \approx 47.26$ .
M1 (AO2.4) — comparing the new value (55) to the existing mean (47): the deviation $|55 - 47| = 8$ is less than the original $\sigma \approx 9.22$ .
A1 (AO2.4) — therefore the new value is closer to the mean than typical, so the standard deviation will decrease.

Total: 6 marks split AO1 = 4, AO2 = 2. AO2 marks reward the qualitative comparison — Edexcel rewards candidates who can reason about how summary statistics respond to data perturbations without grinding through fresh arithmetic.

Synoptic links

Connects to:

Section 2 — Data presentation and interpretation: box plots, histograms and cumulative-frequency curves all visualise the same location and spread information that mean/median/IQR/SD encode numerically. A skewed histogram immediately predicts $\bar{x} \neq \text{median}$ , and the IQR is read directly off the cumulative-frequency curve at the lower and upper quartiles.
Section 4 — The normal distribution $N(\mu, \sigma^2)$ : the normal is parametrised entirely by its location ( $\mu$ ) and spread ( $\sigma$ ). Standardising via $Z = (X - \mu)/\sigma$ is exactly linear coding with $a = 1/\sigma$ and $b = -\mu/\sigma$ — the same rule $\bar{Z} = 0$ , $\sigma_Z = 1$ falls out immediately.
Section 5 — Regression and correlation (Year 2): the least-squares regression line passes through $(\bar{x}, \bar{y})$ , and Pearson's correlation coefficient is built from $S_{xx}$ , $S_{yy}$ , $S_{xy}$ — all variance-style sums of squared deviations from the mean.
Section 6 — Statistical hypothesis testing: the one-sample $z$ -test statistic $z = (\bar{X} - \mu_0)/(\sigma/\sqrt{n})$ is built directly from the sample mean and the population standard deviation. The standard error $\sigma/\sqrt{n}$ is the spread of the sampling distribution of the mean.

Measures of Location and Spread

Measures of Location and Spread

Measures of Location (Central Tendency)

Mean

Mean from a frequency table

Example (grouped data)

Median

Median from a frequency table

Mode

Measures of Spread (Dispersion)

Range

Interquartile Range (IQR)

Percentiles

Inter-decile range

Variance and Standard Deviation

Variance

Standard Deviation

Calculating variance from raw data

Calculating variance from a frequency table

Example

Interpreting Variance and Standard Deviation

Coded Data (Linear Coding)

Why coding works

Example

Choosing Appropriate Measures

Summary

A-Level Deep Dive: Measures of Location and Spread

Spec mapping

Worked example with full mark scheme

Specimen question modelled on the Edexcel 9MA0 Paper 3 format

Synoptic links

More in Mathematics