AQA A-Level Maths: Statistics — Complete Revision Guide (7357)
AQA A-Level Maths: Statistics — Complete Revision Guide (7357)
Statistics is one of two applied strands in AQA A-Level Maths (7357), and for many candidates it is the highest-yield section in the final weeks of revision. The methods are formula-driven, the question types are predictable, and the marks are concentrated in a small number of techniques. If you understand the AQA Large Data Set, the standard distributions, and the structure of a hypothesis test, you can pick up a substantial chunk of Paper 3 marks even on a tough year.
This guide is a topic-by-topic walkthrough of the statistics content in the 7357 specification: statistical sampling, data presentation and interpretation, probability, statistical distributions, the binomial distribution, the normal distribution, hypothesis testing, correlation and regression, conditional probability, and the Year 2 hypothesis tests on means and on the product moment correlation coefficient. For each topic you will see the core skills, the typical pitfalls, a short worked example or table, and a link to the full lesson on the LearningBro course.
One important structural point. AQA puts statistics on Paper 3 alongside pure mathematics, not alongside mechanics. This is different from Edexcel, where Paper 3 carries both statistics and mechanics together. On AQA Paper 3 you sit a mix of pure and statistics questions; mechanics is on Paper 2. Your Paper 3 timing budget is therefore split between pure and statistics, and the statistics section is more concentrated than on the equivalent Edexcel paper. Familiarisation with the AQA Large Data Set is critical — every series has at least one question that assumes you have actually worked with the data. Conditional probability and the Year 2 hypothesis tests (on means and correlation) are the higher-yield Year 2 additions and deserve disproportionate revision time.
What the AQA 7357 Specification Covers
The AQA A-Level Maths qualification (7357) is assessed through three two-hour papers, each worth 100 marks. Paper 1 is pure. Paper 2 is pure and mechanics. Paper 3 is pure and statistics. There is no choice of questions and no coursework, so every mark must be earned in the exam. The statistics content sits in Sections O to S of the specification and is examined exclusively on Paper 3.
Statistics is one of the most predictable parts of the course in terms of question style. Most series carry a sampling or data-presentation question grounded in the Large Data Set, a probability question mixing tree diagrams with conditional probability, a binomial or normal distribution problem, and a full hypothesis test. The table below shows the sub-topics, the part of the specification they sit under, and a realistic estimate of how many marks across a Paper 3 sitting come from each.
| Topic | Spec Section | Typical Paper 3 marks weight |
|---|---|---|
| Statistical sampling | O | 4-6 marks |
| Data presentation and interpretation | P | 6-10 marks |
| Probability | Q | 4-6 marks |
| Statistical distributions | R | 2-4 marks |
| Binomial distribution | R | 6-8 marks |
| Normal distribution | R | 6-10 marks |
| Hypothesis testing (binomial) | S | 6-8 marks |
| Correlation and regression | P, S | 4-6 marks |
| Conditional probability | Q | 4-6 marks |
| Further hypothesis testing (means, PMCC) | S | 6-10 marks |
These weights are estimates based on the spread of typical 7357 papers — not guarantees for any single year. What is reliable is that the normal distribution and hypothesis testing together account for roughly a quarter of the statistics marks, and that the Large Data Set sits in the background of nearly every sampling, data-presentation, and correlation question. Mastering these higher-weight areas is the single most efficient revision investment for Paper 3.
Statistical Sampling
Statistical sampling is the first topic in the specification, and AQA examines it both on its own and as the framing for Large Data Set questions. The core idea is that you almost never measure a whole population — instead you take a sample, and the way you take that sample determines whether your conclusions are valid.
You need to know the standard sampling methods and when each is appropriate. Simple random sampling gives every member of the population an equal probability of selection; it is the gold standard but requires a complete sampling frame. Systematic sampling picks every kth element from an ordered list; it is fast but vulnerable to periodic patterns in the data. Stratified sampling divides the population into strata and samples in proportion from each; it is the right choice when the strata differ meaningfully (for example, sampling students in proportion across year groups). Quota sampling sets target numbers per category but lets the sampler choose individuals; it is non-random and biased. Opportunity (convenience) sampling uses whoever is available; it is the weakest method and almost always biased.
The other half of this topic is bias. AQA loves questions that hand you a sampling scheme and ask you to identify why it is biased and how to improve it. Common sources of bias include sampling frames that miss a section of the population, self-selection (only motivated respondents reply), and time-of-day or location effects. The expected answer mentions the specific bias by name and proposes a concrete fix — "use a stratified sample by year group instead of an opportunity sample at the school gate" — rather than a vague gesture toward "more random."
A common pitfall is confusing systematic sampling with simple random sampling. They are not the same: systematic gives you a sample that is approximately random only if the underlying list has no periodic structure. Another is reaching for stratified sampling whenever the population has subgroups, even when there is no reason to think the subgroups behave differently.
For full coverage with Large Data Set context and worked sampling-design questions, see the Statistical Sampling lesson.
Data Presentation and Interpretation
Data presentation is the second statistics topic, and on AQA it is heavily tied to the Large Data Set. AQA publishes a specific data set (currently weather observations from UK Met Office stations) and assumes that you have spent time exploring it. Questions can ask you to interpret a histogram or box plot grounded in the Large Data Set, to identify outliers, to compare distributions across stations or seasons, or to spot which station a summary statistic likely came from.
The core summary statistics you need to compute and interpret are: mean xˉ=n∑x, median, mode, range, interquartile range (IQR), variance s2=n∑x2−xˉ2, and standard deviation s=s2. For grouped data, use the midpoint of each class and weight by frequency. A standard exam pattern is to compute summary statistics from a frequency table and then comment on the shape of the distribution.
Outliers are formally defined for AQA either as values more than 1.5×IQR from the nearest quartile (the box plot definition), or as values more than two standard deviations from the mean. The question will tell you which definition to use. Identifying outliers and deciding whether to retain or remove them is a recurring exam skill.
The standard chart types are histograms (with frequency density on the y-axis when class widths are unequal), cumulative frequency diagrams (used to read off the median, quartiles, and percentiles), box-and-whisker plots (for comparing distributions side by side), and scatter diagrams (for bivariate data, leading into correlation). For each, you must be able to read values off the diagram accurately and to comment on what the diagram tells you about the underlying distribution.
A common pitfall is using frequency on the histogram y-axis when class widths are unequal — for AQA, frequency density is required and frequency is wrong. Another is computing the standard deviation using the wrong formula on a calculator (sample versus population) and being inconsistent with the variance formula stated above.
For Large Data Set practice and worked summary-statistic questions, see the Data Presentation and Interpretation lesson.
Probability
Probability at A-Level extends GCSE work into a more formal framework based on set notation, Venn diagrams, and the formal definitions of independence and mutual exclusivity. The basic rules look familiar but the level of precision AQA expects is higher.
The two core formulas are the addition rule P(A∪B)=P(A)+P(B)−P(A∩B) and the multiplication rule P(A∩B)=P(A)×P(B∣A). From these everything else follows. Two events are mutually exclusive when P(A∩B)=0, in which case the addition rule simplifies to P(A∪B)=P(A)+P(B). Two events are independent when P(A∩B)=P(A)×P(B), in which case the multiplication rule simplifies to the same.
Venn diagrams are the workhorse for probability problems involving two or three events. Sketch the diagram, fill in the intersection first, then work outward, ensuring the totals add to 1 (or to the population size, if working with frequencies rather than probabilities). Tree diagrams are the workhorse for sequential events, especially when the conditional probabilities change at each stage (sampling without replacement is the classic example).
A common pitfall is conflating mutual exclusivity with independence. Mutually exclusive events with non-zero probability are never independent — if A happens, B cannot, so knowing A has happened changes the probability of B from P(B) to 0. Another is forgetting to subtract the intersection in the addition rule, which double-counts the overlap.
For Venn-diagram practice and clean worked examples on independence, see the Probability lesson.
Statistical Distributions
This section is short on the AQA specification but lays the groundwork for the binomial and normal distributions. A discrete random variable X takes a finite (or countable) set of values, each with an associated probability. A probability distribution is fully specified by listing the values and their probabilities, with the constraint that the probabilities sum to 1.
The standard summary statistics for a discrete distribution are the expected value E(X)=∑xP(X=x) and the variance Var(X)=E(X2)−[E(X)]2 where E(X2)=∑x2P(X=x). For a uniform discrete distribution on {1,2,…,n}, E(X)=2n+1. AQA also expects you to recognise standard distributions by shape: uniform (flat), binomial (peaked, often skewed), and continuous distributions like the normal.
A common pitfall is forgetting that probabilities must sum to 1. A standard exam question gives a probability table with one unknown probability and asks you to find it; the answer is whatever makes the total equal 1. Another is computing Var(X) as E(X)2−E(X2) rather than E(X2)−[E(X)]2 and getting a negative variance, which is impossible.
For worked examples on expected values and variance, see the Statistical Distributions lesson.
Binomial Distribution
The binomial distribution X∼B(n,p) describes the number of successes in n independent trials, each with probability p of success. The probability of exactly k successes is P(X=k)=(kn)pk(1−p)n−k, and you should know the mean E(X)=np and variance Var(X)=np(1−p).
The conditions for binomial modelling are critical and AQA examines them directly. There must be a fixed number of trials n, each trial must have two outcomes (success and failure), the trials must be independent, and the probability of success p must be constant across trials. If any one fails, the binomial model is wrong. A standard exam question describes a real-world situation and asks you to state which assumption is most questionable — the answer is usually independence (samples without replacement) or constant probability (changing conditions over time).
In practice, you will compute binomial probabilities on a calculator using binomPdf for P(X=k) and binomCdf for P(X≤k). The skills examined are translating the question into the right calculation and getting the inequalities right. "At most 3 successes" is P(X≤3); "at least 3 successes" is P(X≥3)=1−P(X≤2). The off-by-one on P(X≥3) is the most common error in this topic.
A worked example. A factory has a 5% defect rate. In a sample of 20, with X∼B(20,0.05): P(X=2)=(220)(0.05)2(0.95)18≈0.189; P(X≤2)≈0.925; P(X≥1)=1−(0.95)20≈0.642.
For full conditions practice and probability calculations, see the Binomial Distribution lesson.
Normal Distribution
The normal distribution X∼N(μ,σ2) is the continuous bell-shaped distribution parameterised by mean μ and variance σ2 (so σ is the standard deviation). It is the highest-yield distribution on the AQA specification — partly because it appears directly, and partly because it is the limiting distribution behind hypothesis testing on means.
The core skills are: computing P(X<a), P(X>a), and P(a<X<b) for given values of μ and σ using the calculator's normalCdf function; computing inverse-normal problems where you are given a probability and asked to find the corresponding X value using invNorm; and using standardisation Z=σX−μ to convert any normal variable to the standard normal Z∼N(0,1). Standardisation is essential when you are asked to find μ or σ from a probability statement, because the unknown sits inside the standardisation formula rather than inside the calculator.
A standard pattern. Component lengths are N(μ,22) with 10% exceeding 50 mm. Find μ. Standardise: 250−μ=invNorm(0.9)≈1.282, giving μ≈47.44 mm.
The normal approximation to the binomial is on the AQA specification. When n is large and p is not too close to 0 or 1, X∼B(n,p) can be approximated by Y∼N(np,np(1−p)). A continuity correction of ±0.5 is required because you are approximating a discrete distribution with a continuous one. So P(X≤30) becomes P(Y<30.5), and P(X≥30) becomes P(Y>29.5).
A common pitfall is using σ2 where σ is required (or vice versa) when entering values into the calculator — the calculator usually wants σ, but the distribution is parameterised by σ2. Another is forgetting the continuity correction when using the normal approximation, which gives answers that are subtly wrong.
For inverse-normal practice and continuity-correction worked examples, see the Normal Distribution lesson.
Hypothesis Testing
A hypothesis test is a structured argument for or against a claim about a population parameter, based on sample data. The Year 1 hypothesis test on the AQA specification is for the binomial proportion p, and AQA expects a specific structure that you must follow exactly to score full marks.
The five steps are:
- State the hypotheses. H0: p=p0 (the null, the status-quo claim). H1: p=p0 (two-tailed), or p>p0 / p<p0 (one-tailed). The choice of one-tailed versus two-tailed is set by the question.
- State the test statistic. Under H0, X∼B(n,p0).
- State the significance level (usually given, often 5% or 1%).
- Compute the p-value or critical region. For a one-tailed upper test with observed value x, the p-value is P(X≥x∣H0). Compare to the significance level.
- State the conclusion in context. "There is sufficient evidence at the 5% level to reject H0 and conclude that the proportion of defective components has increased" — not just "reject H0."
The two most important traps are the direction of the inequality in the p-value calculation, and the conclusion in context. For an upper-tailed test, P(X≥x), not P(X>x) — the observed value itself is included. For the conclusion, mark schemes reward statements that are tied to the original claim and acknowledge the uncertainty ("there is evidence to suggest"), and penalise statements that overclaim ("we have proven").
A worked example. A coin is suspected of being biased toward heads. In 20 tosses, 14 heads are observed. Test at the 5% level. H0: p=0.5, H1: p>0.5. Under H0, X∼B(20,0.5). p-value =P(X≥14)=1−P(X≤13)≈0.058. Since 0.058>0.05, do not reject H0. There is insufficient evidence at the 5% level to conclude that the coin is biased toward heads.
For the full five-step structure and worked one- and two-tailed tests, see the Hypothesis Testing lesson.
Correlation and Regression
Correlation measures the strength and direction of a linear relationship between two variables. The headline statistic is the product moment correlation coefficient (PMCC), denoted r, which lies between −1 and +1. Values near +1 indicate strong positive linear correlation; values near −1 indicate strong negative; values near 0 indicate no linear relationship. You will compute r on the calculator from a list of paired data, not by hand.
Regression fits a straight line to bivariate data using the least squares method, producing a line of the form y=a+bx. The line is used for prediction within the range of the data (interpolation), but predictions outside the data range (extrapolation) are unreliable and usually wrong. AQA examines this distinction directly — a prediction-from-regression question almost always has a follow-up asking whether the prediction is reliable, and the right answer references whether the value is inside or outside the observed range.
A subtle but common AQA point is that the regression equation y on x is appropriate when x is the controlled variable and y is being predicted from x. If x has measurement error too, the simple regression is not the right tool — but the specification rarely pushes that hard. What it does push is interpretation: a positive gradient means y tends to increase with x, and the intercept means the predicted y when x=0 (which may or may not be physically meaningful).
A common pitfall is confusing correlation with causation. A strong r does not mean changing x causes y to change — there may be a confounding variable. Another is reading off the gradient without units; in an exam, "the gradient is 2.3 hours of revision per mark" is a stronger answer than "the gradient is 2.3."
For PMCC computation and prediction practice with reliability commentary, see the Correlation and Regression lesson.
Conditional Probability
Conditional probability is the probability of one event given that another has occurred, written P(A∣B) and defined by
P(A∣B)=P(B)P(A∩B)
provided P(B)>0. This is one of the higher-yield Year 2 topics on the AQA specification because it ties together the probability rules, Venn diagrams, tree diagrams, and the new formal definition of independence. AQA examines it heavily, often as the "stretch" probability question.
The standard exam patterns are: (1) compute a conditional probability from a Venn diagram or two-way table; (2) verify or refute independence using P(A∩B)=P(A)×P(B); (3) apply the law of total probability P(A)=∑iP(A∣Bi)P(Bi) where the Bi partition the sample space; and (4) Bayes-style reversal, computing P(B∣A) when given P(A∣B). AQA does not require Bayes' theorem by name, but the manipulation P(A∣B)P(B)=P(B∣A)P(A) is fair game.
A worked example. A test for a disease has 95% sensitivity and 90% specificity, with prevalence 1%. What is P(diseased∣positive)? Let D = diseased, T = positive. P(T)=0.95×0.01+0.10×0.99=0.1085. So P(D∣T)=0.10850.95×0.01≈0.0876. Despite the test being 95% accurate, only about 9% of positives are genuine — a classic counterintuitive result AQA likes to set up.
A common pitfall is confusing P(A∣B) with P(B∣A) — they are different probabilities and rarely equal. Another is computing P(A∩B) and dividing by P(A) when the conditioning event is B. Always read the conditioning event carefully and divide by its probability.
For tree-diagram and two-way-table practice with reversal questions, see the Conditional Probability lesson.
Further Hypothesis Testing
Year 2 extends hypothesis testing to two new contexts: tests on the mean of a normally distributed population with known variance, and tests on the product moment correlation coefficient. Both follow the same five-step structure as the binomial test, but the test statistics are different.
For a test on the mean μ, given a sample of size n from N(μ,σ2) with known σ2, the sample mean Xˉ is distributed as Xˉ∼N(μ,nσ2). Under H0: μ=μ0, the test statistic standardises to Z=σ/nXˉ−μ0∼N(0,1). The p-value is computed from the standard normal, and the comparison and conclusion follow the same structure as before.
A worked example. A battery's mean lifetime is claimed to be 50 hours, with known σ=4. A sample of 25 gives xˉ=48.5. Test at 5% whether the mean is less than claimed. H0: μ=50, H1: μ<50. Under H0, Xˉ∼N(50,16/25). Z=4/548.5−50=−1.875. p-value =P(Z<−1.875)≈0.030<0.05, so reject H0. There is sufficient evidence at the 5% level to conclude the mean lifetime is less than 50 hours.
For a test on the PMCC, the hypotheses are H0: ρ=0 (no linear correlation in the population) versus H1: ρ=0 (or one-tailed). The test statistic is the sample r, and you compare it to a critical value read from a table given in the formula booklet, indexed by sample size and significance level. If ∣r∣ exceeds the critical value (or for one-tailed, r exceeds the signed critical value in the right direction), reject H0.
A common pitfall on the mean test is using σ where σ/n is required — the standard error of the sample mean is not the population standard deviation. Another is using the wrong tail when the question describes a one-sided concern ("the mean has decreased"). On the PMCC test, the most common error is forgetting that the table gives critical values for r, not r2, and confusing the row and column indices.
For both tests with worked five-step structures, see the Further Hypothesis Testing lesson.
Common Mark-Loss Patterns Across Statistics
Across the whole statistics strand, a small set of habits accounts for a disproportionate share of lost marks. None of these are about content you do not know — they are about content you do know, applied carelessly.
- Off-by-one errors on binomial inequalities. P(X≥3)=1−P(X≤2), not 1−P(X≤3). The single most common mistake on Paper 3.
- Missing or wrong continuity corrections when using the normal approximation to the binomial. Always shift by 0.5 in the right direction.
- Conclusions to hypothesis tests not given in context. "Reject H0" alone is partial credit; the full conclusion ties back to the original claim and acknowledges uncertainty.
- Confusing σ with σ2 when entering parameters into the calculator for the normal distribution.
- Confusing P(A∣B) with P(B∣A) in conditional probability. Always identify the conditioning event before dividing.
- Using frequency rather than frequency density on histograms with unequal class widths. AQA marks this as wrong.
- Treating mutually exclusive events as independent. For non-trivial events, the two concepts are incompatible.
- Extrapolating regression lines without flagging the unreliability.
- Forgetting the standard error σ/n in tests on the mean — a Year 2 trap that catches candidates who learned the Year 1 binomial test thoroughly.
- Not stating the distribution of the test statistic under H0 in hypothesis tests. The mark scheme explicitly looks for this line.
A revision plan that explicitly drills these habits — not just the content — will move your Paper 3 grade more than another pass through the textbook.
Recommended Six-Week Revision Plan
This plan assumes about 4-5 hours per week on this strand. Adjust pace if you are starting earlier or later.
| Week | Topics | Practice |
|---|---|---|
| 1 | Statistical sampling; data presentation; Large Data Set familiarisation | 10 sampling-design questions; 15 summary-statistic and histogram questions; one Large Data Set exploration session |
| 2 | Probability; statistical distributions | 15 Venn-diagram and tree-diagram problems; 10 expected-value and variance problems |
| 3 | Binomial distribution; normal distribution (forward problems) | 15 binomial probability problems; 15 forward normal problems including the approximation with continuity correction |
| 4 | Normal distribution (inverse and unknown μ / σ); hypothesis testing (binomial) | 10 inverse-normal problems; 10 full five-step binomial hypothesis tests, one- and two-tailed |
| 5 | Correlation and regression; conditional probability | 10 PMCC and regression problems with reliability commentary; 15 conditional probability problems including reversal |
| 6 | Further hypothesis testing (means, PMCC); mixed Paper 3 statistics practice | 10 hypothesis tests on means; 5 PMCC tests; one full Paper 3 statistics section per day |
The point of the plan is to keep moving forward while maintaining contact with earlier topics. Do not spend three weeks on the binomial and run out of time before conditional probability. By the end of week 5, every topic in the strand should have had focused contact. Week 6 is consolidation, with the higher-yield Year 2 hypothesis tests getting the final push.
How LearningBro's AQA A-Level Maths Statistics Course Helps
LearningBro's AQA A-Level Maths: Statistics course is built around the structure of this guide. Each of the ten lessons covers one section of the 7357 statistics specification, in the order AQA teaches it, with worked examples, practice questions and full mark-scheme-style solutions. The hypothesis-testing lessons drill the five-step structure until it is automatic.
The course is designed to be used in two ways. As a first pass, work through the lessons in order, building each topic on the last and connecting back to the Large Data Set throughout. As a revision tool, drop into any lesson and work the practice independently — for example, drilling the normal distribution for a week before mocks. The AI tutor gives targeted hints when you get stuck and marks your written hypothesis tests with structured feedback against the AQA mark scheme.
If you want one place to revise this strand of the spec well, start with the AQA A-Level Maths: Statistics course.