Hypothesis Testing for the Binomial Distribution

This lesson covers hypothesis testing using the binomial distribution as required by the Edexcel A-Level Mathematics specification (9MA0), Paper 3 Section A -- Statistics. You need to set up null and alternative hypotheses, calculate p-values, find critical regions, and draw conclusions in context.

What is a Hypothesis Test?

A hypothesis test determines whether there is enough evidence in a sample to reject a claim about a population parameter.

The five-step framework

State the hypotheses (H0 and H1).
State the significance level (alpha).
Calculate the test statistic or p-value.
Compare with the critical value or alpha.
Draw a conclusion in context.

Null and Alternative Hypotheses

H0: p = p0 (the null -- assumed true unless evidence says otherwise)
H1: p < p0 (one-tailed, lower)
H1: p > p0 (one-tailed, upper)
H1: p ≠ p0 (two-tailed)

Exam Tip: H0 always contains "=". H1 contains <, > or ≠. Always state both clearly.

Significance Level

The significance level (alpha) is the probability of rejecting H0 when it is actually true (Type I error). Common values: 0.05 (5%), 0.01 (1%), 0.10 (10%).

Using p-values

The p-value is the probability of the observed result (or more extreme) under H0.

If p-value ≤ alpha: reject H0.
If p-value > alpha: do not reject H0.

Calculating the p-value

Under H0, X ~ B(n, p0).

Lower tail: p-value = P(X ≤ x)
Upper tail: p-value = P(X ≥ x) = 1 - P(X ≤ x - 1)
Two-tailed: double the probability in the relevant tail.

Example

A shop claims 30% of customers buy product A. In a sample of 20, only 2 did. Test at 5% whether the proportion is lower.

H0: p = 0.3, H1: p < 0.3. X ~ B(20, 0.3).

p-value = P(X ≤ 2) = 0.000798 + 0.006839 + 0.02785 = 0.0355

0.0355 < 0.05, so reject H0. Sufficient evidence that the proportion is less than 0.3.

Using Critical Regions

The critical region is the set of values leading to rejection of H0.

Lower tail (H1: p < p0): Find the largest c such that P(X ≤ c) ≤ alpha. Critical region: X ≤ c.

Upper tail (H1: p > p0): Find the smallest c such that P(X ≥ c) ≤ alpha. Critical region: X ≥ c.

The actual significance level

Because the binomial is discrete, P(X ≤ c) may not equal alpha exactly. The actual significance level is the exact probability of the critical region.

Example

X ~ B(15, 0.5) under H0. H1: p < 0.5. alpha = 0.05.

P(X ≤ 3) = 0.01758 ≤ 0.05 (in critical region) P(X ≤ 4) = 0.05923 > 0.05 (not in critical region)

Critical region: X ≤ 3. Actual significance = 1.758%.

Two-Tailed Tests

H1: p ≠ p0. Split alpha equally: each tail has alpha/2.

Example

H0: p = 0.5, H1: p ≠ 0.5, alpha = 0.05. X ~ B(20, 0.5).

Lower: P(X ≤ 5) = 0.0207 ≤ 0.025. Upper (by symmetry): X ≥ 15.

Critical region: X ≤ 5 or X ≥ 15.

Drawing Conclusions

Reject H0: "There is sufficient evidence at the [alpha x 100]% significance level to conclude that [H1 in context]."

Do not reject H0: "There is insufficient evidence at the [alpha x 100]% significance level to conclude that [H1 in context]."

Never say "accept H0". Say "do not reject H0" or "insufficient evidence".
Always give your conclusion in context.

Type I and Type II Errors

	H0 is true	H0 is false
Reject H0	Type I error (prob = alpha)	Correct
Do not reject H0	Correct	Type II error

Summary

H0: p = p0. H1: p < p0, p > p0, or p ≠ p0.
p-value = P(observed or more extreme | H0).
p-value ≤ alpha: reject H0. p-value > alpha: do not reject.
Critical region: found by solving P(X ≤ c) ≤ alpha or P(X ≥ c) ≤ alpha.
Two-tailed: split alpha between both tails.
Always conclude in context. Never "accept H0".

A-Level Deep Dive: Hypothesis Testing for the Binomial Distribution

Spec mapping

Edexcel 9MA0-03 specification section 8 — Statistical hypothesis testing, sub-strands 8.1, 8.2 and 8.3 covers the language of statistical hypothesis testing, developed through a binomial model: null hypothesis, alternative hypothesis, significance level, test statistic, 1-tail test, 2-tail test, critical value, critical region, acceptance region, p-value (refer to the official specification document for exact wording). This sits in Paper 3 — Statistics and Mechanics, but draws on section 4 (The binomial distribution) for the underlying model, section 7 (Probability) for conditional probability concepts under $H_0$ , and is extended in Year 2 to hypothesis tests for the mean of a normal distribution and for a correlation coefficient. The Edexcel formula booklet provides the binomial PMF $P(X = r) = \binom{n}{r}p^r(1-p)^{n-r}$ but does not provide critical values — these come from cumulative binomial tables or calculator functions.

Worked example with full mark scheme

Question (8 marks):

A manufacturer claims that 30% of its chocolate bars contain a "lucky" wrapper. A consumer group buys a random sample of 25 bars and finds that only 3 contain a lucky wrapper. Test, at the 5% significance level, whether there is evidence that the proportion of lucky wrappers is less than 30%.

Solution with mark scheme:

Step 1 — define variable and state hypotheses.

Let $X$ be the number of lucky wrappers in a sample of 25 bars. Under $H_0$ , $X \sim B(25, 0.3)$ .

$H_0: p = 0.3$ $H_1: p < 0.3$ (one-tailed test)

B1 — correct definition of $X$ and parameter $p$ (the population proportion).

B1 — both hypotheses stated correctly with strict equality in $H_0$ and the correct one-tailed inequality in $H_1$ . Common error: writing $H_0: p < 0.3$ — this confuses null and alternative. Another error: writing $H_1: p \neq 0.3$ — that would be a two-tailed test, inconsistent with the question wording "less than 30%".

Step 2 — identify the test statistic and significance level.

Test statistic: $X = 3$ (observed). Significance level: $\alpha = 0.05$ .

Step 3 — compute the P-value (lower tail).

$P(X \leq 3 \mid p = 0.3) = \sum_{r=0}^{3} \binom{25}{r}(0.3)^r(0.7)^{25-r}$

From cumulative binomial tables (or calculator): $P(X \leq 3) = 0.0332$ (4 d.p.).

M1 — correct probability statement $P(X \leq 3)$ under $H_0$ (lower-tail because $H_1$ is "less than"). Common error: computing $P(X = 3)$ alone — this is not the P-value; the P-value is the probability of an observation at least as extreme as the one obtained, in the direction of $H_1$ .

A1 — correct numerical P-value $0.0332$ .

Step 4 — compare with significance level.

$0.0332 < 0.05$ , so the observed result lies in the critical region.

M1 — explicit comparison of P-value to $\alpha$ . Examiners want the inequality stated, not just "significant".

Step 5 — state conclusion in context.

Reject $H_0$ . There is sufficient evidence at the 5% level to support the consumer group's suspicion that the proportion of lucky wrappers is less than 30%.

A1 — conclusion in context, naming the proportion of lucky wrappers (not just "reject $H_0$ "). Examiners require the conclusion to refer back to the original scenario.

B1 — final mark for stating both the statistical decision (reject $H_0$ ) and the contextual interpretation, with the significance level explicitly named.

Total: 8 marks (B1 B1 M1 A1 M1 A1 B1 — final B1 for non-assertive language such as "evidence to suggest" rather than "proof").

Specimen question modelled on the Edexcel 9MA0 Paper 3 format

Question (6 marks): A teacher claims that 40% of students at a large college study a foreign language. A researcher takes a random sample of 20 students and finds that 12 study a foreign language. Stating your hypotheses clearly, test at the 10% significance level whether the teacher's claim is incorrect.

Mark scheme decomposition by AO:

B1 (AO1.2) — $X \sim B(20, 0.4)$ under $H_0$ , with $X$ defined as the number of language students in the sample.
B1 (AO2.5) — hypotheses $H_0: p = 0.4$ , $H_1: p \neq 0.4$ (two-tailed, because "incorrect" admits either direction).
M1 (AO1.1b) — recognising that with $\alpha = 0.10$ two-tailed, each tail uses $\alpha/2 = 0.05$ .
M1 (AO1.1b) — computing $P(X \geq 12 \mid p = 0.4) = 1 - P(X \leq 11) = 1 - 0.9435 = 0.0565$ .
A1 (AO2.2b) — comparing $0.0565$ with $0.05$ : not in the critical region (just outside).
A1 (AO3.5b) — conclusion in context: insufficient evidence at the 10% significance level to reject the teacher's claim that 40% of students study a foreign language.

Total: 6 marks split AO1 = 3, AO2 = 2, AO3 = 1. Two-tailed binomial tests are an Edexcel favourite because they reward precisely the AO2 reasoning (halving $\alpha$ ) that distinguishes mid-grade from top-grade candidates. The "just outside" verdict is also pedagogically deliberate: many candidates who incorrectly use $\alpha = 0.10$ in one tail will reject $H_0$ wrongly.

Synoptic links

Connects to:

Section 4 — The binomial distribution $B(n, p)$ : the entire test rests on assuming $X \sim B(n, p_0)$ under $H_0$ . Without confidence in $P(X = r) = \binom{n}{r}p^r(1-p)^{n-r}$ and cumulative $P(X \leq r)$ , the test cannot be carried out. Independence and constant probability across trials must be checked or stated.
Year 2 — Hypothesis test for the mean of a normal distribution: the same logic (state hypotheses, compute test statistic, compare to critical value or P-value) extends to $\bar{X} \sim N(\mu, \sigma^2/n)$ under $H_0: \mu = \mu_0$ . The vocabulary of Type I error rate $= \alpha$ transfers identically.
Year 2 — Hypothesis test for a correlation coefficient $r$ : $H_0: \rho = 0$ vs $H_1: \rho \neq 0$ , comparing the sample $r$ against tabulated critical values for given $n$ and $\alpha$ . Same five-step structure.
Section 7 — Probability and conditional probability: the P-value is $P(\text{data this extreme or more} \mid H_0)$ — a conditional probability. Misunderstanding this conditioning is the root of nearly every misinterpretation of P-values.
Year 2 modelling assumptions: the binomial requires fixed $n$ , independent trials, constant $p$ , and binary outcomes. Real data routinely violate these — students sampled from one tutor group are not independent if friends influence each other's choices. AO3 marks reward explicit acknowledgement of which assumption is fragile.

Mark-scheme literacy

Hypothesis-testing questions on 9MA0-03 split AO marks more evenly than procedural topics:

AO	Typical share	Earned by
AO1 (knowledge / procedure)	40–50%	Stating hypotheses, computing binomial probabilities, comparing to $\alpha$
AO2 (reasoning / interpretation)	30–40%	Choosing one- vs two-tailed correctly, halving $\alpha$ for two-tail, identifying critical region from tables
AO3 (problem-solving / modelling)	15–25%	Conclusion in context, commenting on modelling assumptions, evaluating Type I/II risk

Examiner-rewarded phrasing: "There is sufficient evidence at the 5% significance level to suggest that …"; "Since the P-value $0.0332 < 0.05$ , we reject $H_0$ "; "Assuming each bar is independently sampled with constant probability $p$ of containing a lucky wrapper, $X \sim B(25, p)$ ". Phrases that lose marks: "we accept $H_0$ " (you can only fail to reject); "this proves the manufacturer is wrong" (hypothesis tests give evidence, not proof); writing $H_0: p < 0.3$ ( $H_0$ must be a strict equality).

A specific Edexcel pattern to watch: when a question says "test whether $p$ has changed", that signals a two-tailed test. When it says "test whether $p$ has increased" or "decreased", that signals a one-tailed test. The verb determines the form of $H_1$ .

Grade-band model answers

3-mark question

Question: A coin is suspected of being biased towards heads. It is tossed 20 times and lands heads 14 times. State suitable hypotheses for a test at the 5% significance level.

Grade C response (~140 words):

Let $X$ be the number of heads in 20 tosses, $X \sim B(20, p)$ .

$H_0: p = 0.5$ $H_1: p > 0.5$

This is a one-tailed test at the 5% significance level.

Examiner commentary: Full marks (3/3). The candidate defines $X$ , states the binomial model, and writes both hypotheses with correct strict equality in $H_0$ and the correct one-tailed direction in $H_1$ . Naming the test as one-tailed is good practice. Many Grade C candidates lose marks by writing $H_0: p \leq 0.5$ or by omitting $X$ 's definition; this answer avoids both pitfalls.

Grade A response (~210 words):*

Hypothesis Testing for the Binomial Distribution

Hypothesis Testing for the Binomial Distribution

What is a Hypothesis Test?

The five-step framework

Null and Alternative Hypotheses

Significance Level

Using p-values

Calculating the p-value

Example

Using Critical Regions

The actual significance level

Example

Two-Tailed Tests

Example

Drawing Conclusions

Type I and Type II Errors

Summary

A-Level Deep Dive: Hypothesis Testing for the Binomial Distribution

Spec mapping

Worked example with full mark scheme

Specimen question modelled on the Edexcel 9MA0 Paper 3 format

Synoptic links

Mark-scheme literacy

Grade-band model answers

3-mark question

More in Mathematics