Edexcel A-Level Maths: Statistics — Complete Revision Guide (9MA0 Paper 3)
Edexcel A-Level Maths: Statistics — Complete Revision Guide (9MA0 Paper 3)
Statistics is one of the two applied strands of Edexcel A-Level Maths (9MA0), and it lives entirely on Paper 3. Section A of Paper 3 is statistics; Section B is mechanics. Each section accounts for roughly 50 of the 100 marks on that paper, so statistics alone is worth about a sixth of the whole A-Level. That is more than most candidates realise when they decide, late in the year, that they will "focus on the pure papers and pick up what they can on Paper 3." Statistics is sat-down, methodical, formula-heavy work, and almost every mark on it is recoverable with disciplined revision.
This guide is a topic-by-topic walkthrough of the statistics content in the 9MA0 specification. It covers everything Edexcel can examine in Section A of Paper 3: sampling methods, data presentation and interpretation, measures of location and spread, correlation and regression, probability, the binomial distribution, the normal distribution, the normal approximation to the binomial, and hypothesis testing for both the binomial proportion and the normal mean. For each topic you will see the core skills, the typical pitfalls, a short worked example, and a link to the full lesson on the LearningBro course.
The aim is not to replace working through past-paper questions. Statistics rewards repetition more than almost any other section of A-Level Maths — there are a handful of standard question shapes, and if you have done each of them ten times you will recognise them on sight. The aim of this guide is to give you a clear map of what is on the spec, in the order Edexcel teaches it, so your revision is targeted rather than scattered. Use it as a checklist, a refresher, and a launchpad into focused practice.
What the Edexcel 9MA0 Specification Covers
The Edexcel A-Level Maths qualification (9MA0) is assessed through three two-hour papers, each worth 100 marks. Paper 3 splits down the middle: Section A is statistics, Section B is mechanics. There is no choice of questions and no coursework, so every mark must be earned in the exam.
Statistics on 9MA0 is built around the Edexcel Large Data Set — a published dataset of weather records that Edexcel expects you to have worked with throughout the course. Questions on Paper 3 will refer to it directly, asking you to reason about a sample, interpret a summary statistic, or recognise an unrealistic value. You do not need to memorise the dataset, but you do need to be familiar enough with it to spot when something is plausible and when something is not.
The table below shows the sub-topics of Section A, the part of the specification they sit under, and a realistic estimate of how many marks across a Paper 3 sitting come from each.
| Topic | Spec Section | Typical Paper 3 marks weight |
|---|---|---|
| Sampling and the Large Data Set | 1.1 | 4-6 marks |
| Data presentation and interpretation | 2.1-2.3 | 6-10 marks |
| Measures of location and spread | 2.2 | 4-6 marks |
| Correlation and regression | 2.4 | 4-6 marks |
| Probability | 3.1-3.2 | 4-6 marks |
| Binomial distribution | 4.1-4.2 | 4-6 marks |
| Normal distribution | 4.3 | 4-6 marks |
| Normal approximation to the binomial | 4.4 | 2-4 marks |
| Hypothesis testing for the binomial | 5.1-5.2 | 4-6 marks |
| Hypothesis testing for the normal mean | 5.3 | 4-6 marks |
These weights are estimates based on the spread of typical 9MA0 papers — not guarantees for any single year. What is reliable is that hypothesis testing and the binomial/normal distributions account for roughly half of Section A between them, with the data-handling topics (presentation, location and spread, correlation) making up most of the rest. Sampling tends to appear as a single short question, often referencing the Large Data Set.
Sampling Methods and the Large Data Set
Sampling is the first topic in the specification because every statistical calculation that follows depends on a clear understanding of how the data was collected. A population is the entire group of interest; a sample is a subset chosen to represent it. A good sample is representative, meaning the distribution of relevant characteristics in the sample matches that in the population.
The five sampling methods you must know are: simple random sampling (every member of the population has an equal probability of selection), systematic sampling (selecting every kth member from a list after a random start), stratified sampling (dividing the population into strata and sampling from each in proportion to its size), opportunity (convenience) sampling (using whoever is easiest to reach), and quota sampling (selecting until pre-set quotas in each subgroup are filled). Edexcel expects you to identify each method, describe how to carry it out, and discuss its advantages and disadvantages.
The Large Data Set is a fixed dataset of UK and overseas weather records used throughout the qualification. Paper 3 questions often refer to it — for instance, asking you to comment on whether a calculated mean rainfall is plausible for Camborne in May, or to identify why a particular sampling method might bias a result. You should know which locations are in the dataset, the range of variables covered (rainfall, temperature, wind speed and direction, sunshine hours, pressure), and the time periods sampled.
A common pitfall is confusing systematic with random sampling. Systematic sampling is not random: once the starting point is chosen, every other selection is fixed. Another is forgetting that stratified sampling requires you to know the strata sizes in advance, which is often impractical. With opportunity samples, the standard exam answer is "the sample is unlikely to be representative of the population," not "the sample is wrong" — convenience sampling is valid; it just has known weaknesses.
For full coverage of all five methods, Large Data Set practice and worked exam questions, see the Sampling Methods lesson.
Data Presentation and Interpretation
Data presentation covers the diagrams Edexcel can ask you to read, draw, or critique: histograms (with frequency density on the y-axis), box plots, cumulative frequency curves, stem-and-leaf diagrams, and scatter diagrams. The skill is partly graphical and partly interpretative — you must be able to extract numerical answers from a diagram and then comment on what they mean in context.
Histograms are the highest-leverage topic here. The y-axis is frequency density, defined as frequency density=class widthfrequency. The frequency in any class is the area of the bar, not its height. So if a class has width 5 and frequency density 4, the frequency is 5×4=20. Histogram questions often give you partial information — a few frequencies, a few class widths — and ask you to fill in the rest of the table.
Outliers appear regularly. Edexcel uses two definitions, and the question will tell you which to use: the 1.5×IQR rule (any value below Q1−1.5(Q3−Q1) or above Q3+1.5(Q3−Q1)) or the two-standard-deviation rule (any value more than two standard deviations from the mean). On a box plot, outliers are marked separately with crosses. Cleaning an outlier means deciding, with reasoning, whether to remove it before further analysis — and whether removing it is justified.
A common pitfall is treating frequency density as if it were frequency. Another is reading a box plot symmetrically by default — many real distributions are skewed, and the median can sit anywhere within the interquartile box. Always check the position of the median before commenting on skew.
For practice on every diagram type and a clean histogram workflow, see the Data Presentation lesson.
Measures of Location and Spread
The standard measures of location are the mean, median and mode. The standard measures of spread are the range, interquartile range (IQR), variance and standard deviation. Edexcel expects you to calculate each, choose the most appropriate measure for a given context, and reason about how each one is affected by outliers and skew.
For raw data x1,x2,…,xn, the mean is xˉ=n1∑xi and the variance is s2=n1∑(xi−xˉ)2. The computational formula, which Edexcel uses on the exam, is:
s2=n∑xi2−xˉ2
The standard deviation is s=s2. For frequency data, replace ∑xi with ∑fixi and n with ∑fi. For grouped data, use the midpoints of each class as the xi values; this introduces a small approximation error, which is unavoidable with grouped data and is not penalised.
Choosing between the measures is partly an exam-question skill. The mean uses every data point but is sensitive to outliers. The median is robust to outliers but ignores the magnitudes of values. The mode is the only sensible measure for categorical data. The standard deviation uses every data point and is comparable across datasets; the IQR is robust to outliers and matches the median. A common exam command is "give a reason why the median is more appropriate than the mean," for which the standard answer is "because the data contains outliers" or "because the data is skewed."
A short worked example. For the data {2,4,4,5,7,8}, n=6, ∑x=30, so xˉ=5. Then ∑x2=4+16+16+25+49+64=174, so s2=6174−25=29−25=4, and s=2.
A common pitfall is forgetting to subtract xˉ2 — not (xˉ) — at the end of the variance formula. Another is computing the sample standard deviation (divisor n−1) when Edexcel expects the population standard deviation (divisor n). For 9MA0, the divisor is n.
For step-by-step practice with raw, frequency and grouped data, see the Measures of Location and Spread lesson.
Correlation and Regression
Correlation measures the strength and direction of a linear relationship between two variables. Regression finds the line of best fit through a scatter of (x,y) points. Edexcel treats these as two halves of the same topic, and exam questions often combine them.
The product moment correlation coefficient r takes values between −1 and 1. Values close to 1 indicate strong positive linear correlation; values close to −1 indicate strong negative linear correlation; values close to 0 indicate no linear relationship. You are not expected to compute r from raw data on the exam — your calculator will do that — but you must interpret the value you get and comment on whether a linear model is appropriate.
The regression line of y on x has equation y=a+bx, where b is the slope and a the intercept. The line is computed by the calculator using the least-squares criterion: it minimises the sum of squared vertical distances from the data points to the line. Edexcel expects you to interpret a and b in context — for instance, "the slope b=0.4 means that for every additional hour of revision, the predicted exam mark increases by 0.4 percentage points."
A central skill is distinguishing interpolation from extrapolation. Predictions made with x-values inside the range of the data (interpolation) are usually reliable. Predictions made outside the range (extrapolation) are not, because there is no evidence the linear relationship continues. A standard exam question gives you a regression line and asks for a prediction; the right answer often includes a comment that the prediction is unreliable because it is outside the data range.
A common pitfall is confusing correlation with causation. A strong r value does not establish that one variable causes the other. Another is using the regression line of y on x to predict x from y — that requires the regression line of x on y, which is a different line.
For full coverage of correlation, regression and interpretation in context, see the Correlation and Regression lesson.
Probability
Probability at A-Level extends the GCSE basics into more careful reasoning about events, conditional probability and independence. The notation is precise and the diagrams (Venn diagrams, tree diagrams, two-way tables) are essential aids.
The four facts you build everything on are: P(A)+P(A′)=1 (the complement rule), P(A∪B)=P(A)+P(B)−P(A∩B) (the addition rule), P(A∩B)=P(A)×P(B∣A) (the multiplication rule with conditional probability), and the definition of conditional probability:
P(A∣B)=P(B)P(A∩B)
Two events are independent if P(A∩B)=P(A)×P(B), equivalently if P(A∣B)=P(A). Two events are mutually exclusive if P(A∩B)=0. Independence and mutual exclusivity are different concepts and Edexcel will test the difference — mutually exclusive events with non-zero probabilities cannot be independent.
A short worked example. A bag contains 5 red and 3 blue counters. Two are drawn without replacement. The probability of two reds is 85×74=5620=145. The probability of one red and one blue, in either order, is 85×73+83×75=5630=2815. A tree diagram makes the calculation visible and reduces sign and arithmetic errors.
A common pitfall is using the wrong denominator in conditional probability — for instance, computing P(A∣B) using P(A) in the denominator instead of P(B). Another is treating selections without replacement as independent, when by definition they are not. Always read the question for "with replacement" or "without replacement" before starting.
For Venn-diagram and tree-diagram practice and a complete walkthrough of conditional probability, see the Probability lesson.
Binomial Distribution
The binomial distribution B(n,p) models the number of successes in n independent trials, each with probability p of success. The four conditions for a binomial model are: a fixed number of trials, two outcomes per trial (success or failure), constant probability of success, and independence between trials. If any of these fails, the binomial model is inappropriate.
If X∼B(n,p), then:
P(X=r)=(rn)pr(1−p)n−r
for r=0,1,…,n. The mean is E(X)=np and the variance is Var(X)=np(1−p). Your calculator computes P(X=r), P(X≤r) and the inverse — you should know exactly which buttons to press for each.
The single most important skill is rephrasing exam questions in the right form. "At least 4" means P(X≥4)=1−P(X≤3). "More than 4" means P(X>4)=1−P(X≤4). "Fewer than 4" means P(X<4)=P(X≤3). Edexcel mark schemes are strict on inequalities — getting the boundary wrong by one is the most common reason for losing easy marks here.
A short worked example. A coin biased so P(heads)=0.4 is tossed 10 times. Find the probability of at least 5 heads. With X∼B(10,0.4), P(X≥5)=1−P(X≤4)=1−0.6331=0.3669 (calculator value rounded to four decimal places).
A common pitfall is checking only some of the binomial conditions when justifying the model. Another is forgetting to convert "at least" or "more than" into the right cumulative form before reaching for the calculator.
For a clean conditions checklist and worked exam questions, see the Binomial Distribution lesson.
Normal Distribution
The normal distribution N(μ,σ2) is the bell-shaped continuous distribution determined by its mean μ and standard deviation σ. (Edexcel uses the convention that the second parameter is the variance σ2, not the standard deviation — read carefully.) The total area under the curve is 1, the curve is symmetric about μ, and roughly 68% of the distribution lies within one standard deviation of the mean, 95% within two, and 99.7% within three.
If X∼N(μ,σ2), your calculator computes P(X<x), P(X>x) and P(a<X<b) directly — and the inverse, finding x for a given probability. You should not be drawing standard normal tables on 9MA0; the calculator handles it. What you do need is the standardisation transformation:
Z=σX−μ
where Z∼N(0,1) is the standard normal distribution. Standardisation is essential for problems where μ or σ is unknown and must be found from a given probability. Sketching the distribution with the relevant region shaded is the single most useful diagnostic before reaching for the calculator.
A short worked example. The heights of adult men in a population are X∼N(175,82) centimetres. Find P(X>185). Standardise: z=8185−175=1.25. Then P(X>185)=P(Z>1.25)=0.1056 (calculator value).
A common pitfall is confusing the variance and standard deviation. If a question gives the standard deviation as 8, the distribution is N(μ,64), not N(μ,8). Another is using the calculator with the wrong inequality direction — always sketch the curve, shade the region, and check the answer is on the right side of the mean.
For standardisation drills and inverse-normal practice, see the Normal Distribution lesson.
Normal Approximation to the Binomial
When n is large and p is not too close to 0 or 1, the binomial distribution B(n,p) is approximately normal. Specifically, if X∼B(n,p) with n large and np>5 and n(1−p)>5, then X can be approximated by:
Y∼N(np,np(1−p))
The mean and variance of the normal approximation match the mean and variance of the binomial. The conditions np>5 and n(1−p)>5 are the standard Edexcel rule for when the approximation is appropriate.
The wrinkle is the continuity correction. The binomial is discrete; the normal is continuous. To approximate P(X≤7) with the binomial, you compute P(Y<7.5) with the normal — the discrete value 7 corresponds to the continuous interval [6.5,7.5). Similarly, P(X≥7)≈P(Y>6.5). Forgetting the continuity correction is the most common error on this topic.
A short worked example. X∼B(100,0.4), so np=40 and np(1−p)=24. Approximate X by Y∼N(40,24). Find P(X≤35). Apply the continuity correction: P(X≤35)≈P(Y<35.5). Standardise: z=2435.5−40=4.899−4.5=−0.918. Then P(Y<35.5)=0.179 (calculator value rounded).
A common pitfall is using the wrong direction for the continuity correction. The rule is: shift the boundary outward to include the discrete value if the inequality is non-strict, and inward to exclude it if strict. Sketch the discrete bar and the continuous curve together and the right shift becomes obvious.
For continuity-correction practice and conditions checklists, see the Normal Approximation lesson.
Hypothesis Testing for the Binomial
A hypothesis test is a structured procedure for deciding, on the basis of sample evidence, whether to reject a claim about a population. For the binomial, the claim is about a population proportion p. The structure of every test is the same:
- State the null hypothesis H0:p=p0 and the alternative hypothesis H1 (one-tailed: p<p0 or p>p0; two-tailed: p=p0).
- Identify the test statistic X∼B(n,p0) under H0.
- Calculate the p-value — the probability under H0 of observing the sample value or one more extreme.
- Compare with the significance level α (usually 5% or 1%).
- Reject H0 if p-value<α; otherwise do not reject H0.
- Write a contextual conclusion using "there is/is not sufficient evidence at the α level to suggest..."
For two-tailed tests, the significance level is split: a 5% two-tailed test compares the p-value with 2.5% in whichever tail the observed value sits.
The critical-value approach is an alternative formulation. The critical region is the set of values of X for which H0 would be rejected. For a one-tailed test of H1:p>p0 at the 5% level, the critical region is the smallest set {c,c+1,…,n} for which P(X≥c∣H0)≤0.05. The actual significance level is then P(X≥c∣H0), which is at most 5% but usually less because the binomial is discrete.
A common pitfall is concluding "H1 is true" or "we accept H0." Neither phrase is correct. The structured conclusion is "there is sufficient evidence to reject H0" or "there is insufficient evidence to reject H0." Another is forgetting to halve the significance level for two-tailed tests.
For the full six-step structure with worked one- and two-tailed examples, see the Binomial Hypothesis Testing lesson.
Hypothesis Testing for the Normal Mean
The second hypothesis-testing topic is testing a claim about the mean of a normal distribution when the population standard deviation σ is known. The structure mirrors the binomial test, but the test statistic is different.
If a sample of size n is drawn from X∼N(μ,σ2), then the sample mean Xˉ has distribution:
Xˉ∼N(μ,nσ2)
This is the distribution of the sample mean, and it is the engine of normal hypothesis testing. The variance of Xˉ is σ2/n, smaller than the variance of X — averaging reduces variability. The standard deviation of Xˉ is σ/n, sometimes called the standard error.
The test then proceeds as for the binomial: state H0:μ=μ0 and H1 (one- or two-tailed), compute the test statistic z=σ/nxˉ−μ0, find the p-value from the standard normal, compare with α, and conclude in context. The arithmetic is cleaner than in the binomial case because the normal is continuous, but the structure is identical.
A short worked example. A machine fills cartons with mean 500ml and standard deviation 5ml. A sample of 25 cartons has mean 498ml. Test at the 5% level whether the machine is under-filling. H0:μ=500, H1:μ<500. Under H0, Xˉ∼N(500,25/25)=N(500,1). The test statistic is z=1498−500=−2. The p-value is P(Z<−2)=0.0228. Since 0.0228<0.05, reject H0: there is sufficient evidence at the 5% level to suggest the machine is under-filling.
A common pitfall is using σ instead of σ/n in the denominator of the test statistic. Another is forgetting the direction of the alternative hypothesis when reading the p-value off the calculator.
For full structured worked examples and contextual conclusion practice, see the Normal Hypothesis Testing lesson.
Common Mark-Loss Patterns Across Statistics
Across the whole statistics section, a small set of habits accounts for a disproportionate share of lost marks.
- Confusing variance and standard deviation when reading the parameters of a normal distribution. N(μ,σ2) takes the variance.
- Boundary errors with binomial inequalities. "At least 4" is P(X≥4)=1−P(X≤3), not 1−P(X≤4).
- Forgetting the continuity correction when approximating the binomial with the normal.
- Using σ instead of σ/n in the standard error for hypothesis tests on the sample mean.
- Wrong tail in two-tailed tests. Halve the significance level before comparing with the p-value.
- Concluding "H0 is true" or "H1 is true." The correct phrasing is "there is/is not sufficient evidence at the α level to reject H0."
- Computing the wrong regression line. y on x is for predicting y from x.
- Not showing enough working. State the distribution, state the inequality, then compute.
Exam Strategy for Paper 3 Section A
Section A of Paper 3 is sat under time pressure: roughly 50 marks in a third of a two-hour paper, which means about 1.2 minutes per mark — already tight, and the mechanics in Section B is no easier. The strategy that recovers marks is structural rather than content-driven.
Read every question with the Large Data Set in mind. Even if a question does not explicitly cite the dataset, Edexcel often draws plausibility checks from it. A mean rainfall of 3mm for a Camborne May day is plausible; 30mm probably is not. If you find yourself contradicting your prior intuition about the Large Data Set, double-check the arithmetic.
For hypothesis tests, write the structure out every time: H0, H1, distribution under H0, test statistic or p-value, comparison with α, conclusion in context. Method marks are awarded for the structure, not just the answer. On a five-mark hypothesis-testing question, two of the marks are typically for structure even if the final conclusion is wrong.
For binomial and normal calculations, state the distribution before reaching for the calculator. "X∼B(20,0.3)" written down is one method mark on its own. Then state the probability you are computing, in inequality form, before pressing buttons: "P(X≥4)=1−P(X≤3)." This habit also catches half of the boundary errors that lose easy marks.
Finally, on data-presentation questions, read the units and scales carefully before drawing conclusions. A histogram with frequency density on the y-axis is different from one with frequency on the y-axis, and Edexcel will sometimes give you the latter as a distractor. Always check the axis label.
How LearningBro's Edexcel A-Level Maths Statistics Course Helps
LearningBro's Edexcel A-Level Maths: Statistics course is built around the structure of this guide. Each of the ten lessons covers one section of the 9MA0 statistics specification, in the order Edexcel teaches it, with worked examples, practice questions and full mark-scheme-style solutions. Lessons end with a short review and quick-recall questions designed for spaced revisits.
The course is designed to be used in two ways. As a first pass, you can work through the lessons in order, building each topic on the last. As a revision tool, you can drop into any lesson and work the practice independently — for example, drilling hypothesis testing for a week before mocks. The AI tutor is available throughout to give targeted hints when you get stuck, without giving away full solutions, and to mark your written working with structured feedback.
If you want one place to revise this section of the spec well, with realistic practice and clean explanations of every topic, the full course is the right next step. Start with the Edexcel A-Level Maths: Statistics course.