Non-Parametric Tests

Non-parametric (or distribution-free) tests draw conclusions without assuming the data come from a normal — or any specific — distribution. They are the right tool when normality is doubtful, when the data are only ordinal (ranks, ratings), or when the sample is too small to invoke the Central Limit Theorem. This lesson develops three core tests: the sign test (signs only), the Wilcoxon signed-rank test (signs and magnitudes, for one sample or paired data), and the Wilcoxon rank-sum / Mann–Whitney test (two independent samples). Each replaces real values by counts or ranks, which is exactly what makes the null distribution computable without a normality assumption.

Where this sits in AQA 7367

This is Paper 3 optional content — Statistics (7367/3S), taken with one of Mechanics (7367/3M) or Discrete (7367/3D). Paper 3 is 2 hours, 100 marks, AO1 40% / AO2 25% / AO3 35%. Carrying out the ranking and statistic is AO1; justifying the choice of a non-parametric test (and stating its assumptions) is AO2; a multi-stage applied test is AO3. It builds on the binomial distribution (the sign test's null model is $B(n, \tfrac12)$ ) and on the hypothesis-testing framework of earlier lessons; it is the rank-based cousin of the $t$ -tests, used when their normality assumption fails.

When to use a non-parametric test

Situation	Parametric test (needs normality)	Non-parametric alternative
One-sample location (median)	one-sample $t$ -test	sign test or Wilcoxon signed-rank test
Paired data (matched pairs)	paired $t$ -test	sign test or Wilcoxon signed-rank test on the differences
Two independent samples	two-sample $t$ -test	Wilcoxon rank-sum (Mann–Whitney) test

The trade-off is power vs robustness: when the normality assumptions do hold, a non-parametric test is slightly less powerful (it discards information by using ranks/signs), but when those assumptions fail it is far more robust and its stated significance level remains valid. The sign test is the least powerful (it keeps only the sign of each difference); the Wilcoxon tests recover power by also using the order of magnitudes.

Why not always use the $t$ -test? Because the $t$ -test's validity rests on the sampling distribution of $\bar X$ being normal, which for a small sample requires the population itself to be (roughly) normal. If the data are heavily skewed, contain outliers, or are merely ordinal (ranks, ratings, preferences), the $t$ -test's stated $5\%$ significance level can be badly wrong — it might reject a true null far more (or less) than $5\%$ of the time. The non-parametric tests sidestep this entirely: their null distributions ( $B(n,\tfrac12)$ for the sign test, tabulated rank distributions for the Wilcoxon tests) are derived from combinatorics of ranks and signs, not from any assumed population shape, so the significance level is exact whatever the data look like. That is the precise sense in which they are "distribution-free."

The Sign Test

The sign test is the simplest non-parametric test. It tests the median of a distribution, using only the signs of the deviations from the hypothesised median — not their sizes. Because only the direction of each deviation matters, the test needs no assumption about the shape of the distribution.

One-sample sign test

$H_0\!: \text{population median} = m_0, \qquad H_1\!: \text{median} \neq m_0 \ (\text{or} > m_0,\ < m_0).$

Procedure. (1) For each observation form $x_i - m_0$ . (2) Count the positive differences $S^+$ and the negative differences $S^-$ . (3) Discard any zero differences and let $n$ be the number remaining. (4) Under $H_0$ a value is equally likely to be above or below the median, so $S^+ \sim B(n, \tfrac12)$ . (5) Compute the binomial tail probability and compare with $\alpha$ .

Worked Example 1 — sign test, no significant result (with mark scheme)

A manufacturer claims the median weight of packets is $500$ g. A sample of $12$ packets gives $498, 502, 497, 501, 503, 499, 504, 496, 505, 500, 498, 502$ .

Differences from $500$ : $-2, +2, -3, +1, +3, -1, +4, -4, +5, 0, -2, +2$ . Discard the zero, so $n = 11$ , with $S^+ = 6$ , $S^- = 5$ . For a two-tailed test the statistic is $\min(S^+, S^-) = 5$ :

$\text{under } H_0,\ S \sim B(11, \tfrac12); \quad p\text{-value} = 2\,P(S \le 5). \quad (\text{M1 } B(n,\tfrac12); \ \text{M1 tail})$

Because $B(11,\tfrac12)$ is symmetric about $5.5$ , $P(S\le 5) = 0.5$ , so the two-tailed $p$ -value $= 1$ . Do not reject $H_0$ — no evidence against the claimed median. (M1 for the binomial null model; M1 for the correct tail; A1 conclusion. The near-equal counts make non-rejection obvious.)

Worked Example 2 — sign test, a significant result (with mark scheme)

A trainer claims a new programme raises times. For $10$ athletes the change (after − before) is positive for $9$ and negative for $1$ (no ties). Test at $5\%$ that the median change is positive.

$H_0\!: \text{median change} = 0, \quad H_1\!: \text{median change} > 0; \quad S^+ = 9,\ S^- = 1,\ n = 10. \quad (\text{B1 hypotheses})$

One-tailed, large $S^+$ supports $H_1$ , so find the upper tail under $B(10,\tfrac12)$ :

$P(S^+ \ge 9) = P(9) + P(10) = \binom{10}{9}(\tfrac12)^{10} + (\tfrac12)^{10} = \frac{10 + 1}{1024} = 0.0107. \quad (\text{M1; A1})$

Since $0.0107 < 0.05$ , reject $H_0$ : evidence at the $5\%$ level that the programme raises times. (B1 hypotheses; M1 upper-tail binomial; A1 $0.0107$ ; A1 contextual conclusion. Note the sign test uses only the $9$ -versus- $1$ split — the sizes of the changes are ignored.)

Paired Sign Test

For paired data $(x_i, y_i)$ — the same subject measured twice (before/after), or matched pairs — form the differences $d_i = x_i - y_i$ and apply the one-sample sign test to the $d_i$ with $m_0 = 0$ . The hypotheses become $H_0$ : median difference $= 0$ (no change) against the appropriate alternative. This is the non-parametric analogue of the paired $t$ -test, and needs no assumption of normality of the differences — only that a "no change" state corresponds to a median difference of zero, with each pair independently equally likely to increase or decrease under $H_0$ . It is the natural choice when the paired differences are skewed or merely ordinal.

The Wilcoxon Signed-Rank Test

The sign test throws away a lot of information — it records only whether each deviation is positive or negative, ignoring how big it is. The Wilcoxon signed-rank test recovers that lost information by ranking the magnitudes of the deviations and attaching their signs, so a large deviation counts for more than a small one. This makes it more powerful than the sign test, at the modest extra cost of one assumption (symmetry of the differences).

Assumptions

The differences are independent.
The distribution of differences is symmetric about the median (so positive and negative deviations of a given size are equally likely under $H_0$ ).
Data are at least on an ordinal scale (the magnitudes can be meaningfully ranked).

Procedure

Compute $d_i = x_i - m_0$ (or paired differences $d_i = x_i - y_i$ ).
Discard zero differences and reduce $n$ accordingly.
Rank the absolute differences $|d_i|$ from smallest to largest, averaging any ties.
Attach to each rank the sign of the corresponding $d_i$ .
Sum the positive ranks: $T^+ = \sum \text{ranks of positive } d_i$ .
Sum the negative ranks: $T^- = \sum \text{ranks of negative } d_i$ .
The test statistic is $T = \min(T^+, T^-)$ .
Compare $T$ with the critical value from the Wilcoxon signed-rank table.
Reject $H_0$ if $T \le$ critical value (note: small $T$ , not large).

Worked Example 3a: one-sample Wilcoxon (with mark scheme)

Test $H_0$ : population median $= 20$ against $H_1$ : median $\neq 20$ , at $5\%$ , using the sample $22, 16, 25, 13, 29, 31, 19$ .

Deviations $d_i = x_i - 20$ : $+2, -4, +5, -7, +9, +11, -1$ (no zeros, $n = 7$ ). Rank the absolute values $1, 2, 4, 5, 7, 9, 11$ as $1, 2, 3, 4, 5, 6, 7$ , then re-attach signs:

$d$	$+2$	$-4$	$+5$	$-7$	$+9$	$+11$	$-1$
$	d	$	2	4	5	7	9
rank	2	3	4	5	6	7	1
signed	$+2$	$-3$	$+4$	$-5$	$+6$	$+7$	$-1$

$T^+ = 2 + 4 + 6 + 7 = 19, \quad T^- = 3 + 5 + 1 = 9; \quad \text{check } 19 + 9 = 28 = \tfrac{7\times 8}{2}.\ \checkmark \quad (\text{M1; A1})$ $T = \min(19, 9) = 9. \quad (\text{A1})$

The two-tailed $5\%$ critical value ( $n = 7$ ) is $2$ ; since $T = 9 > 2$ , do not reject $H_0$ — insufficient evidence the median differs from $20$ . (M1 ranking absolute deviations; A1 signed-rank sums with the check; A1 for $T$ ; the comparison uses $T \le$ critical value.)

Worked Example 3b: paired Wilcoxon (with mark scheme)

Ten students take a test before and after a study programme:

| Student | Before | After | $d$ | $|d|$ | Rank | Signed Rank | |---------|--------|-------|-------|---------|------|-------------| | 1 | 45 | 52 | 7 | 7 | 5.5 | +5.5 | | 2 | 60 | 58 | -2 | 2 | 1.5 | -1.5 | | 3 | 55 | 63 | 8 | 8 | 7 | +7 | | 4 | 70 | 72 | 2 | 2 | 1.5 | +1.5 | | 5 | 40 | 50 | 10 | 10 | 8.5 | +8.5 | | 6 | 65 | 60 | -5 | 5 | 4 | -4 | | 7 | 50 | 57 | 7 | 7 | 5.5 | +5.5 | | 8 | 75 | 78 | 3 | 3 | 3 | +3 | | 9 | 48 | 58 | 10 | 10 | 8.5 | +8.5 | | 10 | 52 | 40 | -12 | 12 | 10 | -10 |

$T^+ = 5.5 + 7 + 1.5 + 8.5 + 5.5 + 3 + 8.5 = 39.5, \qquad T^- = 1.5 + 4 + 10 = 15.5. \quad (\text{M1 rank}; \ \text{M1 sum signed ranks})$

Check (always do this): $T^+ + T^- = 39.5 + 15.5 = 55 = \dfrac{10\times 11}{2}$ , confirming the ranks total $\tfrac{n(n+1)}{2}$ .

$T = \min(T^+, T^-) = \min(39.5, 15.5) = 15.5. \quad (\text{A1})$

From Wilcoxon signed-rank tables ( $n = 10$ , two-tailed, $5\%$ ) the critical value is $8$ ; we reject only if $T \le$ critical value:

$T = 15.5 > 8 \;\Rightarrow\; \text{do not reject } H_0. \quad (\text{A1 conclusion})$

There is insufficient evidence of a change in test scores. (M1 for the absolute-value ranking with averaged ties; M1 for the signed-rank sums; A1 for $T = 15.5$ ; A1 for comparing against the table value the correct way — Wilcoxon rejects for small $T$ , the opposite of most tests. The shared ranks $1.5, 1.5$ (the two $|d| = 2$ ) and $8.5, 8.5$ (the two $|d| = 10$ ) and $5.5, 5.5$ (the two $|d| = 7$ ) are averaged.)

Non-Parametric Tests

Non-Parametric Tests

Where this sits in AQA 7367

When to use a non-parametric test

The Sign Test

One-sample sign test

Worked Example 1 — sign test, no significant result (with mark scheme)

Worked Example 2 — sign test, a significant result (with mark scheme)

Paired Sign Test

The Wilcoxon Signed-Rank Test

Assumptions

Procedure

Worked Example 3a: one-sample Wilcoxon (with mark scheme)

Worked Example 3b: paired Wilcoxon (with mark scheme)

The Wilcoxon rank-sum (Mann–Whitney) test

More in Mathematics