You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
A contingency table (also called a two-way table) displays the frequency distribution of two categorical variables. The chi-squared test of association determines whether the two variables are independent.
An r×c contingency table has r rows and c columns:
| Column 1 | Column 2 | ⋯ | Column c | Row Total | |
|---|---|---|---|---|---|
| Row 1 | O11 | O12 | ⋯ | O1c | R1 |
| Row 2 | O21 | O22 | ⋯ | O2c | R2 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | |
| Row r | Or1 | Or2 | ⋯ | Orc | Rr |
| Col Total | C1 | C2 | ⋯ | Cc | N |
where N is the grand total.
H0: There is no association between the two variables (they are independent).
H1: There is an association between the two variables (they are not independent).
Under H0 (independence), the expected frequency for cell (i,j) is:
Eij=NRi×Cj
This follows from the multiplication rule for independent events: P(row i∩col j)=P(row i)×P(col j).
X2=∑all cellsEij(Oij−Eij)2
ν=(r−1)(c−1)
| Table size | ν |
|---|---|
| 2×2 | 1 |
| 2×3 | 2 |
| 3×3 | 4 |
| 3×4 | 6 |
A survey asks 300 students whether they prefer subject A, B, or C, categorised by gender:
| Subject A | Subject B | Subject C | Row Total | |
|---|---|---|---|---|
| Male | 60 | 40 | 50 | 150 |
| Female | 30 | 50 | 70 | 150 |
| Col Total | 90 | 90 | 120 | 300 |
H0: Gender and subject preference are independent.
Expected frequencies: Eij=300Ri×Cj
| Subject A | Subject B | Subject C | |
|---|---|---|---|
| Male | 300150×90=45 | 300150×90=45 | 300150×120=60 |
| Female | 45 | 45 | 60 |
Test statistic:
| Cell | O | E | (O−E)2/E |
|---|---|---|---|
| M, A | 60 | 45 | 5.000 |
| M, B | 40 | 45 | 0.556 |
| M, C | 50 | 60 | 1.667 |
| F, A | 30 | 45 | 5.000 |
| F, B | 50 | 45 | 0.556 |
| F, C | 70 | 60 | 1.667 |
X2=5.000+0.556+1.667+5.000+0.556+1.667=14.446
Degrees of freedom: ν=(2−1)(3−1)=2.
At 5%, χ22=5.991.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.