Data Types and Sampling Methods

This lesson introduces the fundamental building blocks of statistics — understanding the different types of data and the methods used to collect them. A clear grasp of data classification and sampling is essential for the AQA GCSE Mathematics Statistics topic and frequently appears in exam questions worth 2–4 marks.

Types of Data

Data can be classified in several ways. The first distinction is between qualitative and quantitative data.

Type	Definition	Examples
Qualitative	Data that describes qualities or characteristics (non-numerical)	Eye colour, favourite subject, type of transport
Quantitative	Data that can be measured or counted (numerical)	Height, number of siblings, temperature

Quantitative Data: Discrete vs Continuous

Quantitative data is further divided into two types:

Type	Definition	Examples
Discrete	Data that can only take specific values (usually whole numbers from counting)	Number of pets, shoe size, dice score
Continuous	Data that can take any value within a range (usually from measuring)	Height (1.65 m), weight (72.3 kg), time (14.7 seconds)

Exam Tip: A common exam question asks you to classify data. Remember — if you count it, it is discrete; if you measure it, it is continuous. Shoe size is a classic trick question: although it has half sizes (5.5, 6, 6.5), it is still discrete because it can only take specific values, not any value in a range.

Primary and Secondary Data

Data can also be classified by how it was collected.

Type	Definition	Advantages	Disadvantages
Primary data	Data you collect yourself for a specific purpose	Tailored to your needs; you know how it was collected	Time-consuming and expensive to collect
Secondary data	Data collected by someone else, often for a different purpose	Quick and cheap to obtain	May not exactly match your needs; may be out of date or biased

Examples of primary data: surveys, experiments, questionnaires, observations.

Examples of secondary data: government statistics, newspaper reports, internet databases, school records.

Populations and Samples

In statistics:

The population is the entire group you are interested in studying.
A sample is a smaller subset of the population that you actually collect data from.

We use samples because it is usually impractical (too expensive, too time-consuming) to survey an entire population.

What Makes a Good Sample?

A good sample should be:

Representative — it should reflect the characteristics of the whole population.
Large enough — bigger samples give more reliable results.
Unbiased — every member of the population should have a fair chance of being selected.

Exam Tip: If a question asks you to criticise a sampling method, check whether the sample is biased (certain groups are excluded or over-represented), too small, or unrepresentative of the population.

Sampling Methods

There are several methods for selecting a sample. You need to know the following five:

1. Random Sampling

Every member of the population has an equal chance of being selected. Names or numbers are drawn at random (e.g. using a random number generator, pulling names from a hat).

Advantage: No bias in selection; every member has the same chance.
Disadvantage: Requires a complete list of the population; may not represent all subgroups.

2. Systematic Sampling

Members are selected at regular intervals from an ordered list (e.g. every 10th person on a register).

Advantage: Simple and easy to carry out once you have an ordered list.
Disadvantage: Can introduce bias if there is a hidden pattern in the list.

3. Stratified Sampling

The population is divided into groups (strata) based on a characteristic (e.g. age, gender, year group). A random sample is then taken from each group, in proportion to the size of that group in the population.

The number to sample from each stratum is calculated using:

Number from stratum = (number in stratum / total population) x sample size

Worked Example

A school has the following students:

Year Group	Number of Students
Year 7	180
Year 8	160
Year 9	200
Year 10	150
Year 11	110
Total	800

A stratified sample of 80 students is needed.

Year 7: (180 / 800) x 80 = 18 students

Year 8: (160 / 800) x 80 = 16 students

Year 9: (200 / 800) x 80 = 20 students

Year 10: (150 / 800) x 80 = 15 students

Year 11: (110 / 800) x 80 = 11 students

Check: 18 + 16 + 20 + 15 + 11 = 80 (correct)

Advantage: Proportionally representative of each subgroup.
Disadvantage: Requires prior knowledge of the population's characteristics.

4. Quota Sampling

The researcher decides how many people from each group to include (sets a quota) and then selects people until each quota is filled. Unlike stratified sampling, the selection within each group is not random.

Advantage: Quick and cheap; no complete list of population needed.
Disadvantage: The researcher chooses who to include, which introduces bias.

5. Convenience (Opportunity) Sampling

The researcher simply surveys whoever is easiest to reach or most readily available.

Advantage: Quick, easy, and cheap.
Disadvantage: Very likely to be biased and unrepresentative.

Exam Tip: In AQA exams, stratified sampling calculation questions are very common. Always show the fraction (stratum size / total population) multiplied by the sample size. Round to the nearest whole number if necessary, and always check that your values add up to the required sample size.

Bias in Data Collection

Bias occurs when a sample does not fairly represent the population, leading to misleading results.

Common sources of bias include:

Selection bias — certain groups are excluded (e.g. surveying only students who stay after school).
Question bias — leading or confusing questions push respondents towards a particular answer.
Response bias — people may lie or exaggerate (e.g. about exercise habits).
Non-response bias — certain types of people may not respond to a survey, skewing the results.
Timing bias — collecting data at a particular time may miss certain groups (e.g. surveying a high street at 10 am on a weekday misses people who work).

graph TD
    A[Sources of Bias] --> B[Selection Bias]
    A --> C[Question Bias]
    A --> D[Response Bias]
    A --> E[Non-response Bias]
    A --> F[Timing Bias]
    B --> B1[Certain groups excluded from sample]
    C --> C1[Leading or confusing questions]
    D --> D1[People lie or exaggerate answers]
    E --> E1[Some groups do not respond]
    F --> F1[Data collected at unrepresentative time]

Summary

Data can be qualitative (descriptive) or quantitative (numerical).
Quantitative data is either discrete (counted) or continuous (measured).
Primary data is collected first-hand; secondary data comes from existing sources.
A sample is a subset of the population; it should be representative and unbiased.
Key sampling methods: random, systematic, stratified, quota, and convenience.
Stratified sampling requires a proportional calculation for each stratum.
Bias can arise from poor sampling, leading questions, or unrepresentative timing.

Exam Tip: When asked to suggest improvements to a data collection method, always consider whether the sample is large enough, whether it is representative, and whether any groups have been excluded. Mentioning specific sources of bias will gain you marks.

Extended Worked Examples

Worked Example 1 — Stratified Sample with Rounding

A college has 430 students in Year 12 and 370 students in Year 13. A stratified sample of 60 students is to be drawn.

Sample size from Year 12 $= \dfrac{430}{800} \times 60 = 32.25$ , which rounds to 32 students.

Sample size from Year 13 $= \dfrac{370}{800} \times 60 = 27.75$ , which rounds to 28 students.

Check: $32 + 28 = 60$ . Correct.

Had naive rounding produced 32 and 27 (or 33 and 28), we would adjust one group by 1 so the totals match the required sample size of 60.

Worked Example 2 — Critiquing a Biased Sample

A researcher stands outside a vegan cafe at 11 am on a Tuesday and asks 50 adults whether they eat meat. She concludes that 94% of the UK population is vegetarian.

Criticisms:

Selection bias — customers of a vegan cafe are far more likely to be vegetarian than the general population.
Timing bias — 11 am on a Tuesday excludes people at work.
Location bias — a single location cannot represent a whole country.
Sample size — 50 is small relative to a UK population of over 67 million.

The sample is therefore unrepresentative. The conclusion is unreliable — in fact, the true UK vegetarian proportion is closer to 5–7%.

Worked Example 3 — Calculating Required Sample Size

A youth club has 120 members aged 11–13, 90 members aged 14–16, and 40 members aged 17–18. A stratified sample is to be drawn so that 15 members from the 14–16 group are surveyed. What is the total sample size?

Let the total sample size be $n$ . The proportion from the 14–16 group is $\dfrac{90}{250}$ , so:

$\dfrac{90}{250} \times n = 15$

$n = \dfrac{15 \times 250}{90} \approx 41.67$

Round up to 42 members. Then the 11–13 stratum provides $\dfrac{120}{250} \times 42 = 20.16 \approx 20$ , and the 17–18 stratum provides $\dfrac{40}{250} \times 42 = 6.72 \approx 7$ . Check: $20 + 15 + 7 = 42$ .

Worked Example 4 — Systematic Sampling

A supermarket has a list of 2,400 loyalty card holders and wants a systematic sample of 80. The sampling interval is $\dfrac{2400}{80} = 30$ . A random starting point between 1 and 30 is chosen (say 17), then every 30th customer is surveyed: 17, 47, 77, 107, ...

Potential bias: if customers are ordered by sign-up date and every 30th entry happens to coincide with a promotional weekend, the sample may over-represent promotion-driven shoppers.

Worked Example 5 — Designing an Unbiased Questionnaire

Rewrite the leading question "Don't you agree that the new dress code is unfair?" as a neutral question.

Improved: "To what extent do you agree or disagree with the new dress code?" with a five-point Likert scale from strongly agree to strongly disagree. This removes the leading framing and allows respondents to express a range of views.

Answering at different grade levels

Exam-style question: A school has 900 students: 270 in Year 10, 240 in Year 11, 210 in Year 12, and 180 in Year 13. A stratified sample of 60 is taken. (a) Calculate the number of Year 11 students in the sample. (b) Explain one disadvantage of instead using convenience sampling at the school gates.

Grades 3–4 answer: (a) Year 11 has 240 students. Sample = $240 \div 900 \times 60 = 16$ . (b) The sample would be biased because only students near the gate at that time would be picked.

Grades 5–6 answer: (a) Year 11 fraction = $\dfrac{240}{900}$ . Sample number = $\dfrac{240}{900} \times 60 = 16$ . (b) Convenience sampling is biased — it only includes students who happen to be near the gate, so Year 13 students who drive, or students with after-school clubs, would be under-represented. This means the sample is not representative of the population.

Grades 7–9 answer: (a) Using stratified sampling, Year 11 = $\dfrac{240}{900} \times 60 = 16$ students. I would also verify the total: the four year-group samples are 18, 16, 14, 12, summing to 60 as required. (b) Convenience sampling introduces selection bias because the probability of inclusion is not equal across the population. Students who arrive at peak times are over-represented, while students with different timetables or transport patterns are under-represented. This increases the risk that the sample mean $\bar{x}$ diverges systematically from the population mean $\mu$ , reducing the validity of any inference drawn.

AQA alignment: This content is aligned with AQA GCSE Mathematics (8300) specification — specifically Topic S1 (Infer properties of populations from a sample, while knowing the limitations of sampling) and underpins later work in S4 and S5 where representative data is assumed. Assessed on Papers 2 and 3.

Data Types and Sampling Methods

Data Types and Sampling Methods

Types of Data

Quantitative Data: Discrete vs Continuous

Primary and Secondary Data

Populations and Samples

What Makes a Good Sample?

Sampling Methods

1. Random Sampling

2. Systematic Sampling

3. Stratified Sampling

Worked Example

4. Quota Sampling

5. Convenience (Opportunity) Sampling

Bias in Data Collection

Summary

Extended Worked Examples

Worked Example 1 — Stratified Sample with Rounding

Worked Example 2 — Critiquing a Biased Sample

Worked Example 3 — Calculating Required Sample Size

Worked Example 4 — Systematic Sampling

Worked Example 5 — Designing an Unbiased Questionnaire

Answering at different grade levels

More in Mathematics