Hypothesis Testing in Context

This lesson covers hypothesis testing as applied to real-world data — specifically in the contexts of correlation, proportion, and the mean. The emphasis is on setting up tests correctly, carrying them out using data from the AQA large data set, and interpreting the results in context.

Recap: The Hypothesis Testing Framework

A hypothesis test follows a structured procedure:

State the null hypothesis ( $H_0$ ) — the default assumption (no effect, no change, no correlation).
State the alternative hypothesis ( $H_1$ ) — the claim being tested.
Choose the significance level ( $\alpha$ ) — typically 5% (0.05) or 1% (0.01).
Calculate the test statistic or find the probability of the observed result (or more extreme) under $H_0$ .
Compare with the critical value or compare the p-value with $\alpha$ .
State the conclusion in context — either reject $H_0$ or state there is insufficient evidence to reject $H_0$ .

Important Terminology

Term	Definition
Null hypothesis ( $H_0$ )	The default assumption — no effect, no difference, no correlation
Alternative hypothesis ( $H_1$ )	The claim we are testing for
Significance level ( $\alpha$ )	The probability of rejecting $H_0$ when it is true (Type I error)
Test statistic	A value calculated from the data that is used to make the decision
Critical value	The boundary value — reject $H_0$ if the test statistic exceeds this
Critical region	The set of values of the test statistic that lead to rejection of $H_0$
p-value	The probability of obtaining the observed result (or more extreme) under $H_0$

Hypothesis Test for a Proportion

This test uses the binomial distribution and is used when we want to test a claim about the probability of success $p$ .

Setting Up the Test

$H_0: p = p_0$ (the proportion is a stated value)
$H_1: p > p_0$ (one-tailed, upper), or $p < p_0$ (one-tailed, lower), or $p \neq p_0$ (two-tailed)

Under $H_0$ , the number of successes $X \sim B(n, p_0)$ .

Example from the LDS

A student claims that the proportion of days with measurable rainfall at Hurn in July is 0.30. From the large data set, in a sample of 31 July days, 14 have measurable rainfall.

$H_0: p = 0.30$ , $H_1: p > 0.30$ (one-tailed test at 5% level)

Under $H_0$ , $X \sim B(31, 0.30)$ .

Calculate $P(X \geq 14) = 1 - P(X \leq 13)$ .

Using cumulative binomial tables or a calculator: $P(X \leq 13) \approx 0.9614$ , so $P(X \geq 14) \approx 0.0386$ .

Since $0.0386 < 0.05$ , reject $H_0$ . There is sufficient evidence at the 5% significance level to suggest that the proportion of rainy days at Hurn in July is greater than 0.30.

Finding the Critical Region

The critical region for this test is the set of values of $X$ for which we reject $H_0$ . We need the smallest value $c$ such that $P(X \geq c) \leq 0.05$ .

From the cumulative binomial tables: $P(X \geq 14) \approx 0.0386 < 0.05$ , $P(X \geq 13) \approx 0.0729 > 0.05$ .

So the critical region is $X \geq 14$ , and the actual significance level is 0.0386 (3.86%).

Hypothesis Test for the Mean

When testing a claim about the population mean $\mu$ of a normally distributed variable with known variance:

Setting Up the Test

$H_0: \mu = \mu_0$
$H_1: \mu > \mu_0$ or $\mu < \mu_0$ or $\mu \neq \mu_0$

Under $H_0$ , if $X \sim N(\mu, \sigma^2)$ and we take a sample of size $n$ :

$\bar{X} \sim N\left(\mu_0, \frac{\sigma^2}{n}\right)$

The test statistic is:

$z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$

Example from the LDS

The long-term mean daily mean temperature at Leuchars in October is 9.5°C, with a standard deviation of 2.0°C. A sample of 25 October days from the large data set gives a mean of 10.3°C. Test at the 5% level whether the mean has increased.

$H_0: \mu = 9.5$ , $H_1: \mu > 9.5$ (one-tailed)

$z = \frac{10.3 - 9.5}{2.0 / \sqrt{25}} = \frac{0.8}{0.4} = 2.0$

Critical value at 5% (one-tailed): $z = 1.6449$

Since $2.0 > 1.6449$ , reject $H_0$ . There is sufficient evidence at the 5% level to suggest that the mean daily mean temperature at Leuchars in October has increased.

Hypothesis Test for Correlation

To test whether there is significant linear correlation in the population:

Setting Up the Test

$H_0: \rho = 0$ (no linear correlation in the population)
$H_1: \rho > 0$ (positive correlation), or $\rho < 0$ (negative correlation), or $\rho \neq 0$ (any correlation)

The sample PMCC $r$ is compared with critical values from the PMCC table, which depend on the sample size $n$ and the significance level $\alpha$ .

Example from the LDS

A student calculates the PMCC between daily mean temperature and daily total sunshine at Camborne for 15 days in June, obtaining $r = 0.58$ .

$H_0: \rho = 0$ , $H_1: \rho > 0$ (one-tailed, at 5%)

Critical value for $n = 15$ , one-tailed, 5%: $0.4409$

Since $0.58 > 0.4409$ , reject $H_0$ . There is sufficient evidence at the 5% level of a positive correlation between daily mean temperature and daily total sunshine at Camborne in June.

One-Tailed vs Two-Tailed Tests

Test type	$H_1$	Critical region
One-tailed (upper)	$\mu > \mu_0$ or $p > p_0$ or $\rho > 0$	Upper tail only
One-tailed (lower)	$\mu < \mu_0$ or $p < p_0$ or $\rho < 0$	Lower tail only
Two-tailed	$\mu \neq \mu_0$ or $p \neq p_0$ or $\rho \neq 0$	Both tails ( $\alpha/2$ in each)

The choice between one-tailed and two-tailed depends on the question. If the question asks whether a parameter has increased, use a one-tailed (upper) test. If it asks whether it has changed, use a two-tailed test.

Interpreting Conclusions in Context

This is one of the most commonly examined skills and a frequent source of lost marks.

Correct Conclusion Format

If rejecting $H_0$ : "There is sufficient evidence at the [significance level] to suggest that [contextual statement matching $H_1$ ]."

If not rejecting $H_0$ : "There is insufficient evidence at the [significance level] to suggest that [contextual statement matching $H_1$ ]."

Common Errors to Avoid

Error	Correct approach
"Accept $H_0$ "	Say "there is insufficient evidence to reject $H_0$ "
"Prove $H_1$ "	Say "there is evidence to suggest..." — hypothesis tests provide evidence, not proof
Stating the conclusion without context	Always relate back to the original claim and the real-world situation
Using the sample statistic in the hypotheses	Hypotheses are about population parameters ( $\mu$ , $p$ , $\rho$ )

Example of a Good Contextual Conclusion

"There is sufficient evidence at the 5% significance level to suggest that the mean daily mean temperature at Leuchars in October has increased from the long-term average of 9.5°C. This could be due to climate change, although other factors such as natural variability and the urban heat island effect may also play a role."

Type I and Type II Errors

Error type	Definition	Probability
Type I	Rejecting $H_0$ when it is true	Equal to the significance level $\alpha$
Type II	Failing to reject $H_0$ when it is false	Depends on the true parameter value and sample size

The relationship between Type I and Type II errors:

Decreasing $\alpha$ (e.g., from 5% to 1%) reduces the risk of a Type I error but increases the risk of a Type II error.
Increasing the sample size reduces the risk of a Type II error without changing the Type I error rate.

Using Real Data: Practical Considerations

When working with the large data set for hypothesis testing:

Define the population clearly. For example: "All daily mean temperatures at Leuchars in October."
Justify the model. Why is the binomial or normal distribution appropriate?
State any assumptions — independence, constant probability, known variance, etc.
Acknowledge limitations. Real data may not perfectly satisfy model assumptions. Discuss how this affects the reliability of the test.

Summary

Hypothesis tests follow a structured procedure: state hypotheses, choose significance level, calculate test statistic, compare with critical value, state conclusion in context.
Tests for proportions use the binomial distribution.
Tests for means use the normal distribution (when variance is known).
Tests for correlation compare the sample PMCC with critical values from tables.
Conclusions must be stated in context and must not claim to "prove" or "accept."
Type I and Type II errors represent the two ways a hypothesis test can give the wrong answer.

Exam Tip: The most common reason for losing marks on hypothesis testing questions is giving a non-contextual conclusion. Never just write "reject $H_0$ " — always translate this into a sentence about the real-world situation: "There is sufficient evidence at the 5% level to suggest that the mean daily rainfall at Camborne has increased from the historical average of 3.2 mm."

A-Level Deep Dive: Hypothesis Testing in Large Data Set Context

Spec mapping

AQA 7357 specification, Paper 3 — Statistics, sub-strands N1–N3 (hypothesis testing) and S1–S6 (probability and statistical distributions) covers the language of statistical hypotheses; conduct a statistical hypothesis test for the proportion in the binomial distribution $B(n, p)$ , for the mean of a normal distribution with known, given or assumed variance, and for a zero correlation coefficient using a given critical value (refer to the official specification document for exact wording). Hypothesis testing is the AO3-rich climax of A-Level Statistics: it threads through section S2 (binomial distribution and its tests), section S5 (normal distribution and tests on $\mu$ ), section S6 (correlation, regression and the product-moment test) and connects synoptically to section S1 (sampling — every test rests on assumed sampling distributions). The AQA Large Data Set (LDS) — the published weather dataset spanning UK and overseas locations across two contrasting periods — is the explicit setting from which Paper 3 hypothesis-testing questions are drawn. The formula booklet provides binomial PMF, normal CDF inversion, and the product-moment correlation formula, but does not list the test procedure — candidates must structure the test themselves.

Worked example with full mark scheme

Question (8 marks):

A meteorologist claims that, in the months covered by the AQA Large Data Set, the daily mean wind speed at one of the UK locations exceeds the long-run climatological mean of $\mu_0 = 9.5$ knots. A random sample of $n = 25$ days is taken from the LDS for that location. The sample mean wind speed is $\bar{x} = 10.4$ knots. Assume daily mean wind speed is normally distributed with known standard deviation $\sigma = 2.1$ knots.

Test, at the 5% significance level, whether the data support the meteorologist's claim. State your hypotheses, test statistic, P-value (or critical comparison), and conclusion in context. (8)

Solution with mark scheme:

Step 1 — state hypotheses.

Let $\mu$ be the population mean daily wind speed (in knots) at the LDS location over the period sampled.

$H_0: \mu = 9.5 \qquad H_1: \mu > 9.5$

B1 — correct $H_0$ stated as an equality on the population parameter $\mu$ (not on $\bar{x}$ ).

B1 — correct one-tailed $H_1$ matching the directional claim "exceeds". Common error: writing $H_1: \mu \neq 9.5$ (two-tailed) when the claim is directional, which doubles the rejection region and changes the conclusion.

Step 2 — identify the test and sampling distribution.

Under $H_0$ , since the population is normal with known $\sigma$ , the sample mean satisfies:

$\bar{X} \sim N\left(\mu_0, \dfrac{\sigma^2}{n}\right) = N\left(9.5, \dfrac{2.1^2}{25}\right) = N(9.5, 0.1764)$

so the standard error is $\sigma/\sqrt{n} = 2.1/5 = 0.42$ .

M1 — correct sampling distribution of $\bar{X}$ under $H_0$ , including the $\sigma^2/n$ variance.

Step 3 — compute the test statistic.

$z = \dfrac{\bar{x} - \mu_0}{\sigma/\sqrt{n}} = \dfrac{10.4 - 9.5}{0.42} = \dfrac{0.9}{0.42} \approx 2.143$

M1 — correct standardisation. A1 — correct value to at least 3 s.f.

Step 4 — find the P-value (or compare with critical value).

For the one-tailed upper test, the P-value is $P(Z \geq 2.143)$ . Using the standard normal CDF: $P(Z \geq 2.143) = 1 - \Phi(2.143) \approx 1 - 0.9839 = 0.0161$ .

Equivalently, the upper 5% critical value is $z_{0.05} = 1.645$ , and $2.143 > 1.645$ .

M1 — correct identification of the upper-tail probability or correct critical value retrieval.

A1 — correct P-value $\approx 0.016$ (or correct comparison $2.143 > 1.645$ stated explicitly).

Step 5 — conclusion in context.

Since $0.0161 < 0.05$ , we reject $H_0$ at the 5% significance level. There is sufficient evidence to support the meteorologist's claim that the mean daily wind speed at this LDS location exceeds 9.5 knots over the period sampled.

A1 — conclusion that (i) refers to rejecting/not rejecting $H_0$ in the language of evidence, not certainty, and (ii) is phrased in context — naming wind speed, the location, and the directional claim. A bare "reject $H_0$ " earns no context mark.

Total: 8 marks (B2 M3 A3, split as shown).

Specimen question modelled on the AQA 7357 Paper 3 format

Question (6 marks): A student claims that, at a UK LDS location, the proportion of "rain days" (days on which daily rainfall $\geq 1\,\text{mm}$ ) during May–October is at most $0.3$ . From a random sample of $n = 30$ days drawn from the LDS, $X = 14$ are rain days. Conduct a hypothesis test at the 5% significance level to assess whether the data contradict the student's claim. State your conclusion in context.

Mark scheme decomposition by AO:

B1 (AO2.5) — define $p$ as the population proportion of rain days at the LDS location in May–October, and state $H_0: p = 0.3$ , $H_1: p > 0.3$ (the contradiction direction).
M1 (AO1.1b) — model $X \sim B(30, 0.3)$ under $H_0$ .
M1 (AO1.1b) — compute the upper-tail P-value $P(X \geq 14 \mid p = 0.3) = 1 - P(X \leq 13)$ from the binomial CDF.
A1 (AO1.1b) — correct P-value, approximately $0.0169$ (3 s.f.).
A1 (AO3.5b) — compare with $0.05$ and reject $H_0$ .
A1 (AO3.5a) — conclusion in context: "There is sufficient evidence at the 5% level to reject the student's claim; the proportion of rain days appears to exceed 0.3."

Total: 6 marks split AO1 = 3, AO2 = 1, AO3 = 2. This is an AO3-heavy specimen by Statistics standards: the student's claim is the alternative (because it is the inequality being tested against), so framing $H_0$ as the equality boundary and $H_1$ as the strict inequality is the AO2 reasoning step that many candidates botch. The AO3 marks reward the comparison and the contextual conclusion.

Synoptic links

Connects to:

Section S1 — Sampling: every hypothesis test assumes a random sample. If the LDS days are chosen by systematic sampling (every 5th day) rather than simple random sampling, autocorrelation in weather data (today's wind correlates with yesterday's) inflates the effective sample size — invalidating the standard error $\sigma/\sqrt{n}$ . AO3 questions sometimes ask "comment on the validity of the test" — naming this autocorrelation issue is the A* response.
Section S2 — Binomial distribution: binomial tests on $p$ require modelling each LDS day as an independent Bernoulli trial. The exchange between "rain / no rain" Bernoulli outcomes and the binomial count is the conceptual hinge. Independence is suspect for consecutive days — flagging this is AO3.
Section S5 — Normal distribution: tests on $\mu$ with known $\sigma$ use $Z = (\bar{X} - \mu_0)/(\sigma/\sqrt{n})$ . The Central Limit Theorem (Year 2) means even non-normal weather variables (rainfall is right-skewed) yield approximately normal sample means once $n$ is large — this licenses the test under broader conditions than strict normality.
Section S6 — Correlation: the product-moment correlation test $H_0: \rho = 0$ versus $H_1: \rho \neq 0$ uses the sample $r$ against tabulated critical values depending on $n$ . Testing whether daily mean temperature and daily mean wind speed are correlated at an LDS location is a canonical Paper 3 question.
Modelling assumptions across the LDS: AQA explicitly tests whether candidates can criticise the modelling — assuming wind speed is normal when it is bounded below by 0; assuming rain days are independent when storm systems persist for days; assuming the LDS sample period is representative when it covers only specific months. Questions phrased "comment on the suitability of this model" are AO3.5a calls.

Mark-scheme literacy

Hypothesis-testing questions on AQA 7357 Paper 3 split AO marks more evenly than algebra topics:

Hypothesis Testing in Context

Hypothesis Testing in Context

Recap: The Hypothesis Testing Framework

Important Terminology

Hypothesis Test for a Proportion

Setting Up the Test

Example from the LDS

Finding the Critical Region

Hypothesis Test for the Mean

Setting Up the Test

Example from the LDS

Hypothesis Test for Correlation

Setting Up the Test

Example from the LDS

One-Tailed vs Two-Tailed Tests

Interpreting Conclusions in Context

Correct Conclusion Format

Common Errors to Avoid

Example of a Good Contextual Conclusion

Type I and Type II Errors

Using Real Data: Practical Considerations

Summary

A-Level Deep Dive: Hypothesis Testing in Large Data Set Context

Spec mapping

Worked example with full mark scheme

Specimen question modelled on the AQA 7357 Paper 3 format

Synoptic links

Mark-scheme literacy

More in Mathematics