Hypothesis Testing

This lesson covers the fundamentals of hypothesis testing at A-Level. Hypothesis testing is a formal procedure for making decisions about a population parameter based on sample evidence. It is one of the most important topics in A-Level statistics and appears frequently in examinations.

The Hypothesis Testing Framework

A hypothesis test uses sample data to assess evidence against a claim about a population parameter.

Term	Definition
Null hypothesis $H_0$	The default assumption; assumed true unless there is strong evidence against it
Alternative hypothesis $H_1$	The claim we are trying to find evidence for
Test statistic	A value calculated from the sample data used to decide the outcome
Significance level $\alpha$	The probability of rejecting $H_0$ when it is actually true (usually 5% or 1%)
Critical value	The boundary value that determines the rejection region
Critical region	The set of values of the test statistic that would lead to rejection of $H_0$
p-value	The probability of obtaining the observed result (or more extreme) assuming $H_0$ is true

Steps for a Hypothesis Test

State the hypotheses $H_0$ and $H_1$ .
State the significance level $\alpha$ .
Calculate the test statistic from the sample data.
Find the critical value or calculate the p-value.
Make a decision: reject $H_0$ if the test statistic falls in the critical region (or if p-value < $\alpha$ ).
State your conclusion in context, relating it back to the original problem.

Exam Tip: You must state both hypotheses and your conclusion in the context of the question. Writing "reject $H_0$ " without explaining what this means in the real-world context will lose marks.

One-Tailed and Two-Tailed Tests

One-Tailed Test

Tests for a change in one direction only:

$H_1: p > k$ (right-tailed) or $H_1: p < k$ (left-tailed)
The entire significance level is in one tail.

Two-Tailed Test

Tests for a change in either direction:

$H_1: p \neq k$
The significance level is split equally between both tails (e.g., 2.5% in each tail for a 5% test).

Hypothesis Test for a Binomial Proportion

If $X \sim B(n, p)$ , we test $H_0: p = p_0$ against $H_1: p > p_0$ , $H_1: p < p_0$ , or $H_1: p \neq p_0$ .

Example: A manufacturer claims that 10% of items are defective. A sample of 20 items contains 5 defectives. Test at the 5% significance level whether the proportion of defectives has increased.

$H_0: p = 0.1$ , $H_1: p > 0.1$ (one-tailed, right tail)

$X \sim B(20, 0.1)$ under $H_0$

$P(X \geq 5) = 1 - P(X \leq 4) = 1 - 0.9568 = 0.0432$

Since $0.0432 < 0.05$ , reject $H_0$ . There is sufficient evidence at the 5% level to suggest the proportion of defectives has increased.

Errors in Hypothesis Testing

	$H_0$ is true	$H_0$ is false
Reject $H_0$	Type I error (probability = $\alpha$ )	Correct decision
Do not reject $H_0$	Correct decision	Type II error

Type I error: Rejecting $H_0$ when it is actually true (false positive).
Type II error: Failing to reject $H_0$ when it is actually false (false negative).

The significance level $\alpha$ is the probability of a Type I error.

Summary

Hypothesis testing is a formal procedure for assessing evidence against a null hypothesis.
Always state $H_0$ , $H_1$ , the significance level, and your conclusion in context.
Use p-values or critical values to make decisions.
A Type I error is rejecting a true $H_0$ ; a Type II error is failing to reject a false $H_0$ .
The significance level $\alpha$ controls the probability of a Type I error.

Exam Tip: When writing your conclusion, always use the phrase "there is sufficient evidence to suggest..." or "there is insufficient evidence to suggest..." rather than definitively stating that the hypothesis is true or false. Hypothesis tests assess evidence — they do not prove anything.

A-Level Deep Dive: Hypothesis Testing for the Binomial Distribution

Spec mapping

AQA 7357 specification, Paper 3 — Statistics, Section S: Statistical Hypothesis Testing. AQA requires candidates to "understand and apply the language of statistical hypothesis testing", to "conduct a statistical hypothesis test for the proportion in the binomial distribution $B(n, p)$ using either a one-tailed or two-tailed test, and interpret the results in context". This sub-strand sits inside the Statistics route on Paper 3 and connects forward to Section T (sampling distribution of the sample mean and $Z$ -test for a normal mean), Section R (correlation coefficient hypothesis test for $\rho = 0$ ) and laterally into Section P (probability) and Section Q (the binomial model itself). The AQA formula booklet lists the binomial probability mass function but does not tabulate cumulative binomial probabilities — candidates must use the calculator's binomial CDF for critical-region work.

Worked example with full mark scheme

Question (8 marks):

A bakery claims that 30% of its sourdough loaves contain visible starter bubbles on the crust. A quality inspector suspects the true proportion is lower and inspects a random sample of 20 loaves, finding that only 2 have visible bubbles.

(a) Stating your hypotheses clearly, test at the 5% significance level whether there is evidence that the proportion is lower than claimed. (6)

(b) State, with a reason, whether your conclusion would change at the 1% significance level. (2)

Solution with mark scheme:

(a) Step 1 — define the variable and state hypotheses.

Let $X$ be the number of loaves with visible bubbles in a sample of 20. Under the bakery's claim, $X \sim B(20, p)$ .

$H_0: p = 0.3$ (the bakery's claim is correct) $H_1: p < 0.3$ (the proportion is lower than claimed)

B1 — both hypotheses correctly stated in terms of the population proportion $p$ , not in terms of $X$ or $\bar{x}$ . A common error is writing $H_0: X = 0.3$ — that scores zero because $X$ is a count, not a proportion.

Step 2 — identify the test as one-tailed (lower).

The inspector's suspicion ("lower than claimed") fixes a one-tailed lower test. Significance level $\alpha = 0.05$ .

B1 — recognising one-tailed test and stating $\alpha$ .

Step 3 — compute the p-value.

Under $H_0$ , $X \sim B(20, 0.3)$ . The observed value is $x = 2$ . For a lower-tailed test, the p-value is

$P(X \leq 2 \mid p = 0.3) = \sum_{k=0}^{2} \binom{20}{k}(0.3)^k(0.7)^{20-k}$

Using the binomial CDF: $P(X \leq 2) \approx 0.0355$ .

M1 — correct probability statement (cumulative, lower tail, under $H_0$ ). A1 — correct numerical value to 3 significant figures or better.

Step 4 — compare and decide.

$0.0355 < 0.05$ , so the result lies in the critical region. Reject $H_0$ .

M1 — explicit comparison of p-value with $\alpha$ and a decision about $H_0$ .

Step 5 — conclusion in context.

There is sufficient evidence at the 5% significance level to suggest that the true proportion of sourdough loaves with visible bubbles is lower than the bakery's claimed 30%.

A1 — non-assertive conclusion ("sufficient evidence to suggest"), in the context of the question (sourdough loaves, bakery), with the direction of the alternative hypothesis stated.

(b) At $\alpha = 0.01$ : $0.0355 > 0.01$ , so we do not reject $H_0$ .

M1 — correct comparison at the new significance level.

There is insufficient evidence at the 1% level to suggest the proportion is lower than 30%; the conclusion changes.

A1 — context-aware statement of the changed conclusion, with the reason being that the p-value now exceeds $\alpha$ .

Total: 8 marks (B2 M3 A3, split as shown).

Specimen question modelled on the AQA Paper 3 format

Question (6 marks): A coin is suspected of being biased. It is tossed 30 times and lands heads on 21 occasions. Test, at the 5% significance level, whether there is evidence that the coin is biased.

Mark scheme decomposition by AO:

B1 (AO1.1a) — Let $p$ be the probability of heads; $X \sim B(30, p)$ under $H_0$ . State $H_0: p = 0.5$ , $H_1: p \neq 0.5$ .
B1 (AO1.1b) — Identify two-tailed test; significance level split $\alpha/2 = 0.025$ in each tail.
M1 (AO1.1b) — Compute $P(X \geq 21 \mid p = 0.5)$ as the relevant tail probability (since 21 is above the mean of 15).
A1 (AO1.1b) — Numerical value: $P(X \geq 21) = 1 - P(X \leq 20) \approx 0.0214$ .
M1 (AO2.5) — Compare $0.0214$ with $0.025$ and decide to reject $H_0$ (or, equivalently, double the tail probability and compare with 0.05).
A1 (AO3.5) — Conclusion in context: there is sufficient evidence at the 5% level that the coin is biased.

Total: 6 marks split AO1 = 4, AO2 = 1, AO3 = 1. AQA awards the AO3 mark only when the conclusion is fully contextualised — "the coin is biased" is acceptable; "reject $H_0$ " alone is not.

Synoptic links

Connects to:

Section Q — the binomial distribution itself. Hypothesis testing for $p$ is impossible without confidence in $B(n, p)$ probability calculation, the assumptions of independence and constant probability, and the cumulative distribution function. A candidate who cannot compute $P(X \leq k)$ accurately cannot run any test.
Section R — correlation coefficient $r$ . The structure "test $H_0: \rho = 0$ vs $H_1: \rho \neq 0$ at the 5% level" reuses every element of binomial testing — null/alternative, one- vs two-tailed, p-value or critical-value comparison, contextual conclusion. Only the test statistic changes. AQA explicitly examines this parallel.
Section T — hypothesis test for a normal mean. When the sample size is large enough that $\bar{X} \sim N(\mu, \sigma^2/n)$ approximately, the same testing framework applies with $Z = (\bar{x} - \mu_0)/(\sigma/\sqrt{n})$ as the test statistic. Candidates who internalise the binomial test transfer it cleanly.
Statistical inference (broader). Confidence intervals are the dual of hypothesis tests: a 95% confidence interval for $p$ that excludes $p_0$ corresponds to rejecting $H_0: p = p_0$ at the 5% level (two-tailed). This duality is implicit at A-Level but explicit at university.
Modelling assumptions. The binomial test assumes $n$ independent trials with constant probability $p$ . If trials are correlated (e.g. multiple loaves from the same batch), the model breaks and inference is invalid. AQA examines this critical-thinking step in synoptic Paper 3 questions.

Mark-scheme literacy

Hypothesis testing on AQA Paper 3 distributes AO marks distinctively:

AO	Typical share	Earned by
AO1 (knowledge / procedure)	50–60%	Stating $H_0$ and $H_1$ correctly, computing the p-value or critical value, identifying one- vs two-tailed
AO2 (reasoning / interpretation)	20–30%	Choosing the correct tail, comparing p-value with $\alpha$ , justifying rejection or non-rejection
AO3 (problem-solving / modelling)	15–25%	Interpreting the conclusion in the context of the question, commenting on model validity, addressing changes in $\alpha$

Examiner-rewarded phrasing: "there is sufficient/insufficient evidence at the $X\%$ level to suggest …"; "since the p-value ( $0.0355$ ) is less than the significance level ( $0.05$ ), we reject $H_0$ "; "in the context of this problem, we conclude that …". Phrases that lose marks: "the bakery is lying" (assertive, not contextual); "accept $H_0$ " ( $H_0$ is never accepted, only "not rejected"); " $H_0: \bar{x} = 0.3$ " (wrong variable — hypotheses are about the parameter $p$ , not the statistic).

A specific AQA pattern to watch: a question saying "test at the 5% significance level" without specifying one- or two-tailed expects you to read the alternative hypothesis from the wording. "Suspects the proportion is lower" is one-tailed lower; "wonders whether the proportion has changed" is two-tailed. Mis-reading direction costs the entire AO2 mark.

Grade-band model answers

3-mark question

Question: Write down the null and alternative hypotheses for testing whether a coin showing heads on 12 of 20 tosses is biased toward heads. Identify whether the test is one-tailed or two-tailed.

Grade C response (~150 words):

Let $p$ be the probability of heads.

$H_0: p = 0.5$ $H_1: p > 0.5$

This is a one-tailed test because we are testing whether the coin is biased toward heads (a specific direction).

Examiner commentary: Full marks (3/3). The candidate correctly states $H_0$ and $H_1$ in terms of the parameter $p$ , identifies the one-tailed direction, and gives a one-line justification linking "biased toward heads" to " $p > 0.5$ ". The reasoning is brief but every step is correct. This is the standard Grade C answer for a procedural question — efficient and correctly contextualised.

Grade A response (~190 words):*

Define $p$ as the population probability that the coin lands heads on a single toss, modelling $X \sim B(20, p)$ where $X$ is the number of heads in 20 tosses.

$H_0: p = 0.5$ (the coin is fair) $H_1: p > 0.5$ (the coin is biased toward heads)

The phrase "biased toward heads" specifies the direction of departure from fairness, so this is a one-tailed (upper) test. The entire significance level is concentrated in the upper tail of the binomial distribution under $H_0$ .

Hypothesis Testing

Hypothesis Testing

The Hypothesis Testing Framework

Steps for a Hypothesis Test

One-Tailed and Two-Tailed Tests

One-Tailed Test

Two-Tailed Test

Hypothesis Test for a Binomial Proportion

Errors in Hypothesis Testing

Summary

A-Level Deep Dive: Hypothesis Testing for the Binomial Distribution

Spec mapping

Worked example with full mark scheme

Specimen question modelled on the AQA Paper 3 format

Synoptic links

Mark-scheme literacy

Grade-band model answers

3-mark question

More in Mathematics