You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
This lesson introduces the fundamental building blocks of statistics — understanding the different types of data and the methods used to collect them. A clear grasp of data classification and sampling is essential for the AQA GCSE Mathematics Statistics topic and frequently appears in exam questions worth 2–4 marks.
Data can be classified in several ways. The first distinction is between qualitative and quantitative data.
| Type | Definition | Examples |
|---|---|---|
| Qualitative | Data that describes qualities or characteristics (non-numerical) | Eye colour, favourite subject, type of transport |
| Quantitative | Data that can be measured or counted (numerical) | Height, number of siblings, temperature |
Quantitative data is further divided into two types:
| Type | Definition | Examples |
|---|---|---|
| Discrete | Data that can only take specific values (usually whole numbers from counting) | Number of pets, shoe size, dice score |
| Continuous | Data that can take any value within a range (usually from measuring) | Height (1.65 m), weight (72.3 kg), time (14.7 seconds) |
Exam Tip: A common exam question asks you to classify data. Remember — if you count it, it is discrete; if you measure it, it is continuous. Shoe size is a classic trick question: although it has half sizes (5.5, 6, 6.5), it is still discrete because it can only take specific values, not any value in a range.
Data can also be classified by how it was collected.
| Type | Definition | Advantages | Disadvantages |
|---|---|---|---|
| Primary data | Data you collect yourself for a specific purpose | Tailored to your needs; you know how it was collected | Time-consuming and expensive to collect |
| Secondary data | Data collected by someone else, often for a different purpose | Quick and cheap to obtain | May not exactly match your needs; may be out of date or biased |
Examples of primary data: surveys, experiments, questionnaires, observations.
Examples of secondary data: government statistics, newspaper reports, internet databases, school records.
In statistics:
We use samples because it is usually impractical (too expensive, too time-consuming) to survey an entire population.
A good sample should be:
Exam Tip: If a question asks you to criticise a sampling method, check whether the sample is biased (certain groups are excluded or over-represented), too small, or unrepresentative of the population.
There are several methods for selecting a sample. You need to know the following five:
Every member of the population has an equal chance of being selected. Names or numbers are drawn at random (e.g. using a random number generator, pulling names from a hat).
Members are selected at regular intervals from an ordered list (e.g. every 10th person on a register).
The population is divided into groups (strata) based on a characteristic (e.g. age, gender, year group). A random sample is then taken from each group, in proportion to the size of that group in the population.
The number to sample from each stratum is calculated using:
Number from stratum = (number in stratum / total population) x sample size
A school has the following students:
| Year Group | Number of Students |
|---|---|
| Year 7 | 180 |
| Year 8 | 160 |
| Year 9 | 200 |
| Year 10 | 150 |
| Year 11 | 110 |
| Total | 800 |
A stratified sample of 80 students is needed.
Year 7: (180 / 800) x 80 = 18 students
Year 8: (160 / 800) x 80 = 16 students
Year 9: (200 / 800) x 80 = 20 students
Year 10: (150 / 800) x 80 = 15 students
Year 11: (110 / 800) x 80 = 11 students
Check: 18 + 16 + 20 + 15 + 11 = 80 (correct)
The researcher decides how many people from each group to include (sets a quota) and then selects people until each quota is filled. Unlike stratified sampling, the selection within each group is not random.
The researcher simply surveys whoever is easiest to reach or most readily available.
Exam Tip: In AQA exams, stratified sampling calculation questions are very common. Always show the fraction (stratum size / total population) multiplied by the sample size. Round to the nearest whole number if necessary, and always check that your values add up to the required sample size.
Bias occurs when a sample does not fairly represent the population, leading to misleading results.
Common sources of bias include:
graph TD
A[Sources of Bias] --> B[Selection Bias]
A --> C[Question Bias]
A --> D[Response Bias]
A --> E[Non-response Bias]
A --> F[Timing Bias]
B --> B1[Certain groups excluded from sample]
C --> C1[Leading or confusing questions]
D --> D1[People lie or exaggerate answers]
E --> E1[Some groups do not respond]
F --> F1[Data collected at unrepresentative time]
Exam Tip: When asked to suggest improvements to a data collection method, always consider whether the sample is large enough, whether it is representative, and whether any groups have been excluded. Mentioning specific sources of bias will gain you marks.
A college has 430 students in Year 12 and 370 students in Year 13. A stratified sample of 60 students is to be drawn.
Sample size from Year 12 =800430×60=32.25, which rounds to 32 students.
Sample size from Year 13 =800370×60=27.75, which rounds to 28 students.
Check: 32+28=60. Correct.
Had naive rounding produced 32 and 27 (or 33 and 28), we would adjust one group by 1 so the totals match the required sample size of 60.
A researcher stands outside a vegan cafe at 11 am on a Tuesday and asks 50 adults whether they eat meat. She concludes that 94% of the UK population is vegetarian.
Criticisms:
The sample is therefore unrepresentative. The conclusion is unreliable — in fact, the true UK vegetarian proportion is closer to 5–7%.
A youth club has 120 members aged 11–13, 90 members aged 14–16, and 40 members aged 17–18. A stratified sample is to be drawn so that 15 members from the 14–16 group are surveyed. What is the total sample size?
Let the total sample size be n. The proportion from the 14–16 group is 25090, so:
25090×n=15
n=9015×250≈41.67
Round up to 42 members. Then the 11–13 stratum provides 250120×42=20.16≈20, and the 17–18 stratum provides 25040×42=6.72≈7. Check: 20+15+7=42.
A supermarket has a list of 2,400 loyalty card holders and wants a systematic sample of 80. The sampling interval is 802400=30. A random starting point between 1 and 30 is chosen (say 17), then every 30th customer is surveyed: 17, 47, 77, 107, ...
Potential bias: if customers are ordered by sign-up date and every 30th entry happens to coincide with a promotional weekend, the sample may over-represent promotion-driven shoppers.
Rewrite the leading question "Don't you agree that the new dress code is unfair?" as a neutral question.
Improved: "To what extent do you agree or disagree with the new dress code?" with a five-point Likert scale from strongly agree to strongly disagree. This removes the leading framing and allows respondents to express a range of views.
Exam-style question: A school has 900 students: 270 in Year 10, 240 in Year 11, 210 in Year 12, and 180 in Year 13. A stratified sample of 60 is taken. (a) Calculate the number of Year 11 students in the sample. (b) Explain one disadvantage of instead using convenience sampling at the school gates.
Grades 3–4 answer: (a) Year 11 has 240 students. Sample = 240÷900×60=16. (b) The sample would be biased because only students near the gate at that time would be picked.
Grades 5–6 answer: (a) Year 11 fraction = 900240. Sample number = 900240×60=16. (b) Convenience sampling is biased — it only includes students who happen to be near the gate, so Year 13 students who drive, or students with after-school clubs, would be under-represented. This means the sample is not representative of the population.
Grades 7–9 answer: (a) Using stratified sampling, Year 11 = 900240×60=16 students. I would also verify the total: the four year-group samples are 18, 16, 14, 12, summing to 60 as required. (b) Convenience sampling introduces selection bias because the probability of inclusion is not equal across the population. Students who arrive at peak times are over-represented, while students with different timetables or transport patterns are under-represented. This increases the risk that the sample mean xˉ diverges systematically from the population mean μ, reducing the validity of any inference drawn.
AQA alignment: This content is aligned with AQA GCSE Mathematics (8300) specification — specifically Topic S1 (Infer properties of populations from a sample, while knowing the limitations of sampling) and underpins later work in S4 and S5 where representative data is assumed. Assessed on Papers 2 and 3.