You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Non-parametric (or distribution-free) tests draw conclusions without assuming the data come from a normal — or any specific — distribution. They are the right tool when normality is doubtful, when the data are only ordinal (ranks, ratings), or when the sample is too small to invoke the Central Limit Theorem. This lesson develops three core tests: the sign test (signs only), the Wilcoxon signed-rank test (signs and magnitudes, for one sample or paired data), and the Wilcoxon rank-sum / Mann–Whitney test (two independent samples). Each replaces real values by counts or ranks, which is exactly what makes the null distribution computable without a normality assumption.
This is Paper 3 optional content — Statistics (7367/3S), taken with one of Mechanics (7367/3M) or Discrete (7367/3D). Paper 3 is 2 hours, 100 marks, AO1 40% / AO2 25% / AO3 35%. Carrying out the ranking and statistic is AO1; justifying the choice of a non-parametric test (and stating its assumptions) is AO2; a multi-stage applied test is AO3. It builds on the binomial distribution (the sign test's null model is B(n,21)) and on the hypothesis-testing framework of earlier lessons; it is the rank-based cousin of the t-tests, used when their normality assumption fails.
| Situation | Parametric test (needs normality) | Non-parametric alternative |
|---|---|---|
| One-sample location (median) | one-sample t-test | sign test or Wilcoxon signed-rank test |
| Paired data (matched pairs) | paired t-test | sign test or Wilcoxon signed-rank test on the differences |
| Two independent samples | two-sample t-test | Wilcoxon rank-sum (Mann–Whitney) test |
The trade-off is power vs robustness: when the normality assumptions do hold, a non-parametric test is slightly less powerful (it discards information by using ranks/signs), but when those assumptions fail it is far more robust and its stated significance level remains valid. The sign test is the least powerful (it keeps only the sign of each difference); the Wilcoxon tests recover power by also using the order of magnitudes.
Why not always use the t-test? Because the t-test's validity rests on the sampling distribution of Xˉ being normal, which for a small sample requires the population itself to be (roughly) normal. If the data are heavily skewed, contain outliers, or are merely ordinal (ranks, ratings, preferences), the t-test's stated 5% significance level can be badly wrong — it might reject a true null far more (or less) than 5% of the time. The non-parametric tests sidestep this entirely: their null distributions (B(n,21) for the sign test, tabulated rank distributions for the Wilcoxon tests) are derived from combinatorics of ranks and signs, not from any assumed population shape, so the significance level is exact whatever the data look like. That is the precise sense in which they are "distribution-free."
The sign test is the simplest non-parametric test. It tests the median of a distribution, using only the signs of the deviations from the hypothesised median — not their sizes. Because only the direction of each deviation matters, the test needs no assumption about the shape of the distribution.
H0:population median=m0,H1:median=m0 (or>m0, <m0).
Procedure. (1) For each observation form xi−m0. (2) Count the positive differences S+ and the negative differences S−. (3) Discard any zero differences and let n be the number remaining. (4) Under H0 a value is equally likely to be above or below the median, so S+∼B(n,21). (5) Compute the binomial tail probability and compare with α.
A manufacturer claims the median weight of packets is 500 g. A sample of 12 packets gives 498,502,497,501,503,499,504,496,505,500,498,502.
Differences from 500: −2,+2,−3,+1,+3,−1,+4,−4,+5,0,−2,+2. Discard the zero, so n=11, with S+=6, S−=5. For a two-tailed test the statistic is min(S+,S−)=5:
under H0, S∼B(11,21);p-value=2P(S≤5).(M1 B(n,21); M1 tail)
Because B(11,21) is symmetric about 5.5, P(S≤5)=0.5, so the two-tailed p-value =1. Do not reject H0 — no evidence against the claimed median. (M1 for the binomial null model; M1 for the correct tail; A1 conclusion. The near-equal counts make non-rejection obvious.)
A trainer claims a new programme raises times. For 10 athletes the change (after − before) is positive for 9 and negative for 1 (no ties). Test at 5% that the median change is positive.
H0:median change=0,H1:median change>0;S+=9, S−=1, n=10.(B1 hypotheses)
One-tailed, large S+ supports H1, so find the upper tail under B(10,21):
P(S+≥9)=P(9)+P(10)=(910)(21)10+(21)10=102410+1=0.0107.(M1; A1)
Since 0.0107<0.05, reject H0: evidence at the 5% level that the programme raises times. (B1 hypotheses; M1 upper-tail binomial; A1 0.0107; A1 contextual conclusion. Note the sign test uses only the 9-versus-1 split — the sizes of the changes are ignored.)
For paired data (xi,yi) — the same subject measured twice (before/after), or matched pairs — form the differences di=xi−yi and apply the one-sample sign test to the di with m0=0. The hypotheses become H0: median difference =0 (no change) against the appropriate alternative. This is the non-parametric analogue of the paired t-test, and needs no assumption of normality of the differences — only that a "no change" state corresponds to a median difference of zero, with each pair independently equally likely to increase or decrease under H0. It is the natural choice when the paired differences are skewed or merely ordinal.
The sign test throws away a lot of information — it records only whether each deviation is positive or negative, ignoring how big it is. The Wilcoxon signed-rank test recovers that lost information by ranking the magnitudes of the deviations and attaching their signs, so a large deviation counts for more than a small one. This makes it more powerful than the sign test, at the modest extra cost of one assumption (symmetry of the differences).
Test H0: population median =20 against H1: median =20, at 5%, using the sample 22,16,25,13,29,31,19.
Deviations di=xi−20: +2,−4,+5,−7,+9,+11,−1 (no zeros, n=7). Rank the absolute values 1,2,4,5,7,9,11 as 1,2,3,4,5,6,7, then re-attach signs:
| d | +2 | −4 | +5 | −7 | +9 | +11 | −1 |
|---|---|---|---|---|---|---|---|
| $ | d | $ | 2 | 4 | 5 | 7 | 9 |
| rank | 2 | 3 | 4 | 5 | 6 | 7 | 1 |
| signed | +2 | −3 | +4 | −5 | +6 | +7 | −1 |
T+=2+4+6+7=19,T−=3+5+1=9;check 19+9=28=27×8. ✓(M1; A1) T=min(19,9)=9.(A1)
The two-tailed 5% critical value (n=7) is 2; since T=9>2, do not reject H0 — insufficient evidence the median differs from 20. (M1 ranking absolute deviations; A1 signed-rank sums with the check; A1 for T; the comparison uses T≤ critical value.)
Ten students take a test before and after a study programme:
| Student | Before | After | d | ∣d∣ | Rank | Signed Rank | |---------|--------|-------|-------|---------|------|-------------| | 1 | 45 | 52 | 7 | 7 | 5.5 | +5.5 | | 2 | 60 | 58 | -2 | 2 | 1.5 | -1.5 | | 3 | 55 | 63 | 8 | 8 | 7 | +7 | | 4 | 70 | 72 | 2 | 2 | 1.5 | +1.5 | | 5 | 40 | 50 | 10 | 10 | 8.5 | +8.5 | | 6 | 65 | 60 | -5 | 5 | 4 | -4 | | 7 | 50 | 57 | 7 | 7 | 5.5 | +5.5 | | 8 | 75 | 78 | 3 | 3 | 3 | +3 | | 9 | 48 | 58 | 10 | 10 | 8.5 | +8.5 | | 10 | 52 | 40 | -12 | 12 | 10 | -10 |
T+=5.5+7+1.5+8.5+5.5+3+8.5=39.5,T−=1.5+4+10=15.5.(M1 rank; M1 sum signed ranks)
Check (always do this): T++T−=39.5+15.5=55=210×11, confirming the ranks total 2n(n+1).
T=min(T+,T−)=min(39.5,15.5)=15.5.(A1)
From Wilcoxon signed-rank tables (n=10, two-tailed, 5%) the critical value is 8; we reject only if T≤ critical value:
T=15.5>8⇒do not reject H0.(A1 conclusion)
There is insufficient evidence of a change in test scores. (M1 for the absolute-value ranking with averaged ties; M1 for the signed-rank sums; A1 for T=15.5; A1 for comparing against the table value the correct way — Wilcoxon rejects for small T, the opposite of most tests. The shared ranks 1.5,1.5 (the two ∣d∣=2) and 8.5,8.5 (the two ∣d∣=10) and 5.5,5.5 (the two ∣d∣=7) are averaged.)
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.