Data Types and Sampling

Every piece of statistics work begins with two questions: what kind of data am I dealing with? and where did it come from? Getting these right is the foundation of the whole Statistics strand of OCR GCSE Mathematics (J560). If you misclassify your data or collect it from a biased sample, every average, chart and conclusion that follows will be unreliable. This lesson builds the vocabulary and reasoning you need before you touch a single calculation.

This topic is assessed across both the calculator and non-calculator OCR papers. It is mostly AO1 (knowing the definitions of data types and sampling methods) and AO2 (reasoning about why a sample might be biased and how to fix it). OCR likes short "Describe" and "Give a reason for your answer" questions here, so precise written explanations earn the marks, not just one-word labels.

Key Vocabulary

Term	Meaning
Population	The entire group you want to find out about (for example, every Year 11 student in a school)
Sample	A smaller group chosen from the population to represent it
Census	Collecting data from every member of the population
Primary data	Data you collect yourself, first-hand
Secondary data	Data collected by someone else that you reuse
Qualitative data	Non-numerical data describing a quality or category
Quantitative data	Numerical data that is counted or measured
Discrete data	Quantitative data taking only separate, fixed values (you count it)
Continuous data	Quantitative data taking any value in a range (you measure it)
Bias	When a sample does not fairly represent the population
Random sample	A sample where every member has an equal chance of selection
Stratified sample [H]	A sample taken in proportion from each group (stratum) of the population

Types of Data

The first split is between qualitative and quantitative data.

Qualitative (also called categorical) data describes a quality or category and is not a number. Examples: favourite subject, eye colour, type of pet, method of travel to school.
Quantitative data is numerical — it has been counted or measured. Examples: number of texts sent, height in centimetres, time to run 100 m.

Quantitative data then splits again into discrete and continuous:

Discrete data can only take separate, fixed values. You count it. The number of goals a team scores can be 0, 1, 2, 3, … but never 2.4. Shoe size (4, 4.5, 5, 5.5, …) is also discrete — even though it includes halves, the values jump in fixed steps.
Continuous data can take any value within a range. You measure it, and with a finer instrument you always get more decimal places. Height, mass, temperature and time are all continuous: a height could be 162.3 cm, 162.34 cm, and so on.

A useful test: ask "could a value sensibly sit between two of my readings?" If a value of 17.6 is meaningful, the data is continuous; if only whole steps make sense, it is discrete.

Worked Example 1

Classify each variable as qualitative or quantitative, and if quantitative, as discrete or continuous. (a) The colour of cars in a car park. (b) The number of pages in library books. (c) The mass of apples in a crate. (d) The temperature of a cup of tea every minute.

Solution:

(a) Qualitative — colour is a category, not a number.
(b) Quantitative, discrete — you count whole pages; a book cannot have 84.5 pages.
(c) Quantitative, continuous — mass is measured and can take any value, e.g. 132.7 g.
(d) Quantitative, continuous — temperature is measured and can be any value, e.g. 64.2 °C.

Common error: writing "shoe size is continuous" because of the halves. Shoe sizes jump in fixed steps, so they are discrete.

Worked Example 2

State whether each is discrete or continuous: (a) the time taken to download a file; (b) the number of students absent each day; (c) the length of a leaf; (d) a GCSE grade on the 9–1 scale.

Solution:

(a) Continuous — time is measured.
(b) Discrete — you count whole students.
(c) Continuous — length is measured.
(d) Discrete — grades take fixed values 1, 2, …, 9.

Primary and Secondary Data

Primary data is data you collect yourself — through a survey, an experiment or by observation. It is tailored exactly to your question and you control its accuracy, but it takes time and money to gather.

Secondary data is data someone else has already collected that you reuse — from a website, a newspaper, a government database or a textbook. It is quick and cheap to obtain and can give you very large data sets, but it may not match your question precisely, may be out of date, and you cannot be sure how carefully it was collected.

Worked Example 3

Jordan wants to know the average daily rainfall in his town last year. He finds the figures on the Met Office website. Is this primary or secondary data? Give one advantage and one disadvantage of using it.

Solution: It is secondary data — Jordan did not measure the rainfall himself.

Advantage: he gets a full year of accurate, professionally measured data instantly, with no equipment.
Disadvantage: the figures are for the official weather station, which may be several kilometres away, so they may not exactly represent rainfall where he lives.

Populations, Censuses and Samples

The population is the whole group you are interested in. A census collects data from every single member of that population. A sample is a smaller group chosen to stand in for the whole population.

Why not always take a census? Because it is often:

too expensive or too slow (asking all 600,000 GCSE students in a country),
or even impossible — testing the lifetime of every battery a factory makes would destroy the whole stock, leaving none to sell.

A good sample is representative: it reflects the make-up of the population. The bigger and more carefully chosen the sample, the more reliable your conclusions — but a larger sample also costs more, so there is a trade-off.

Worked Example 4

A company makes 50,000 light bulbs a day and wants to know how long they last on average. Explain why the company should use a sample rather than a census.

Solution: Testing how long a bulb lasts means running it until it fails, which destroys it. A census would destroy all 50,000 bulbs, leaving none to sell. A sample lets the company estimate the average lifetime while keeping the rest of the stock to sell.

Sampling Methods

Simple random sampling

In a simple random sample, every member of the population has an equal chance of being chosen, and choices do not influence each other. The standard method is:

Give every member of the population a number.
Use a random number generator (or draw numbered tickets) to pick the required quantity.

This removes selection bias because no person or group is favoured. Its drawback is that you need a complete numbered list of the whole population (a sampling frame), and by chance a small random sample can still miss out parts of the population.

Worked Example 5

A youth club has 240 members. The manager numbers them 001 to 240 and uses a random number generator to choose 30 for a survey. (a) What is the probability that a particular member is chosen? (b) Explain why this is a fair method.

Solution: (a) $P(\text{chosen}) = \dfrac{30}{240} = \dfrac{1}{8}$ . (b) Every member has the same probability $\tfrac{1}{8}$ of being selected, and no group is favoured, so the sample is unbiased.

Worked Example 6

Explain one practical difficulty with taking a simple random sample of all the shoppers who use a supermarket in a year.

Solution: There is no complete numbered list of every shopper for the year (no sampling frame), so you cannot give each shopper a number to select from. Without a full list you cannot guarantee every shopper has an equal chance, so true simple random sampling is impractical here.

Stratified sampling [Higher]

When a population is made of clearly different groups (strata) — such as year groups, departments or age bands — a stratified sample takes from each group in proportion to its size. This guarantees that small groups are not under-represented. The number taken from each stratum is:

$\text{number from a stratum} = \dfrac{\text{size of stratum}}{\text{total population}} \times \text{sample size}$

Within each stratum, members are then chosen by simple random sampling.

Worked Example 7 [H]

A sixth-form college has students spread across three subjects. A stratified sample of 60 students is required.

Subject	Number of students
Sciences	360
Arts	240
Sport	120
Total	720

Work out how many students should be sampled from each subject.

Solution: The sampling fraction is $\dfrac{60}{720} = \dfrac{1}{12}$ .

Sciences: $360 \times \dfrac{1}{12} = 30$
Arts: $240 \times \dfrac{1}{12} = 20$
Sport: $120 \times \dfrac{1}{12} = 10$
Check: $30 + 20 + 10 = 60$ ✓

Common error: dividing 60 by 3 to get 20 from each subject. That ignores the different group sizes and over-samples Sport while under-sampling Sciences.

Worked Example 8 [H]

A gym has 900 members: 540 adults, 270 students and 90 children. A stratified sample of 50 is taken. How many children are in the sample?

Solution: Number of children $= \dfrac{90}{900} \times 50 = 0.1 \times 50 = 5$ children.

Worked Example 9 [H]

In a stratified sample of 40 from 800 employees, 6 employees came from the night shift. How many night-shift employees are there altogether?

Solution: The sampling fraction is $\dfrac{40}{800} = \dfrac{1}{20}$ . If 6 in the sample represent the night shift, then $6 = \dfrac{1}{20} \times (\text{night-shift total})$ , so the night-shift total $= 6 \times 20 = 120$ employees.

Bias and How to Avoid It

A sample is biased when it systematically over- or under-represents part of the population. Common causes:

Cause of bias	Example
Non-random selection	Only asking your own friends
Wrong time or place	Surveying a high street at 11 a.m. misses people at work
Too small a sample	A handful of people is unlikely to reflect everyone
Leading questions	"Don't you agree the canteen food is poor?" pushes a "yes"
Non-response	People who ignore a survey may differ from those who reply
Self-selection	Only people who feel strongly bother to respond

Worked Example 10

Mia wants to know what students across her whole school think of the new timetable. She asks the 28 students in her own form. Give two reasons why this may not give a representative sample, and suggest a better method.

Solution:

Her form is a single year group, so the views of other year groups — who may be affected differently by the timetable — are missing.
28 students is a small fraction of the school, so the sample is unlikely to reflect everyone.

Better method: take a stratified random sample across all year groups, so each year is represented in proportion to its size.

Worked Example 11

A radio station asks listeners to phone in to vote on whether a new bypass should be built. Give one reason why the result may be biased.

Solution: Only listeners who feel strongly enough to phone in will respond (self-selection), and only people who happen to listen to that station are reached. These groups may not represent the views of the whole local population, so the result is biased.

Extended Worked Examples

Worked Example A [H] — Stratified sampling with rounding

A school has 1,150 students: 250 in Year 9, 230 in Year 10, 220 in Year 11, 230 in Year 12 and 220 in Year 13. A stratified sample of 60 is required. Work out how many to sample from each year group.

Step 1 — sampling fraction: $\dfrac{60}{1150} = 0.05217\ldots$

Step 2 — apply to each stratum (round to the nearest whole number):

Year	Population	Calculation	Sample
9	250	$250 \times 0.05217 = 13.04$	13
10	230	$230 \times 0.05217 = 12.00$	12
11	220	$220 \times 0.05217 = 11.48$	11
12	230	$230 \times 0.05217 = 12.00$	12
13	220	$220 \times 0.05217 = 11.48$	11

Step 3 — check the total: $13 + 12 + 11 + 12 + 11 = 59$ . This is one short because of rounding down. Add one to the stratum whose value was closest to rounding up (Year 11 or Year 13, both 11.48); adding one to Year 11 gives 12 and a total of 60. Always finish by checking the strata add to the required sample size.

Worked Example B — Designing an unbiased survey

A council wants residents' opinions on weekend parking charges. A researcher stands outside one car park at 10 a.m. on a Wednesday and asks the first 40 drivers. Identify two sources of bias and describe a better approach.

Solution:

Time bias: 10 a.m. on a weekday misses most working people and, crucially, weekend visitors — yet the survey is about weekend charges.
Location/method bias: only drivers at one car park are asked; pedestrians, cyclists and bus users — who are also residents — are excluded.

Better approach: take a stratified random sample from the electoral roll, grouping residents by, for example, age band and area, so every type of resident is represented in proportion. Send the survey by post or online with reminders to reduce non-response.

Worked Example C — Choosing a sampling method

A factory making 12,000 phone cases a day wants to monitor quality. Suggest a suitable sampling method and explain why a census is unsuitable.

Solution: A systematic-style random sample — for instance testing a randomly chosen case and then every 200th case afterwards — is practical and spreads checks across the whole day's production. A census is unsuitable because inspecting all 12,000 cases every day would be far too slow and expensive, and any destructive test (such as a drop test) would ruin saleable stock.

Answering at different grade levels

Specimen question modelled on the OCR J560 paper format (Higher, 4 marks): A leisure centre has 1,600 members made up of 800 adults, 600 students and 200 children. A stratified sample of 80 members is taken. Work out how many of each type are in the sample and explain why stratified sampling is appropriate here.

Grades 3–4 response: " $80 \div 1600 = 0.05$ . Adults $800 \times 0.05 = 40$ , students $600 \times 0.05 = 30$ , children $200 \times 0.05 = 10$ ." Examiner-style commentary: correct figures earn the method and accuracy marks, but with no explanation the reasoning mark is lost.

Grades 5–6 response: "Fraction $= \tfrac{80}{1600} = \tfrac{1}{20}$ . Adults $= 40$ , students $= 30$ , children $= 10$ (check $40+30+10 = 80$ ). Stratified sampling keeps the right proportion of each type." Examiner-style commentary: correct working, a check, and a basic reason — close to full marks.

Grades 7–9 response: "Using fraction $\tfrac{80}{1600} = \tfrac{1}{20}$ : adults 40, students 30, children 10 (total 80 ✓). Stratified sampling is appropriate because the three groups are very different in size; a simple random sample of 80 could, by chance, contain too few children and so under-represent them. Sampling each group in proportion guarantees fair representation, and members within each stratum are then chosen at random to avoid bias." Examiner-style commentary: full marks — correct values, a check, and precise justification using stratum, proportion and bias.

Common Mistakes and Misconceptions

Calling shoe size or dress size continuous — they take fixed steps, so they are discrete.
Confusing primary and secondary data: if you collected it, it is primary; if you reused someone else's, it is secondary.
Dividing the sample size equally between strata instead of using the proportional formula (a Higher-tier error).
Forgetting to check the strata add back to the required sample size after rounding.
Saying a sample "is biased because it is too small" without explaining why a small sample may fail to represent the population.
Describing a sample as "random" when people were simply chosen by convenience — convenience sampling is not random.

Going Further

The sampling ideas you meet here scale up to professional statistics. Opinion pollsters use quota and stratified sampling to predict elections from samples of around 1,000 people; the smaller the sample, the wider the margin of error they must quote. Ecologists estimating animal populations cannot list every animal, so they use capture–recapture: tag a first sample, release it, then see what fraction of a later sample carries tags. Recognising why full censuses are rare, and how careful sampling controls bias, is exactly the reasoning that underpins real-world data science, medical trials and quality control in industry.

Exam Tips

AO1: learn the exact definitions. OCR awards marks for correctly classifying data (qualitative/quantitative, discrete/continuous) and for naming sampling methods.
AO2: when a question says "Give a reason for your answer" or "Describe", write a full sentence that links your point to the context — "the sample misses people at work" beats "it is biased".
AO3: for "design a better method" questions, name a specific method (usually stratified random sampling) and say why it removes the bias you identified.
Watch the command words: "Work out" and "Calculate" expect numbers with working shown; "Describe" and "Explain" expect written reasoning.
In stratified-sampling questions, always end with a check that your strata add to the required total — it is a quick, free accuracy mark.

Summary

Data is qualitative (categories) or quantitative (numbers); quantitative data is discrete (counted) or continuous (measured).
Primary data you collect yourself; secondary data you reuse from another source.
A census covers the whole population; a sample is a representative subset, used when a census is too costly, too slow or impossible.
In simple random sampling everyone has an equal chance; stratified sampling [H] takes from each group in proportion using $\frac{\text{stratum}}{\text{population}} \times \text{sample size}$ .
Bias arises from non-random selection, poor timing or place, small samples, leading questions and non-response — and is reduced by random, proportional sampling.

This content is aligned with the OCR GCSE Mathematics (J560) specification.

Data Types and Sampling

Data Types and Sampling

Key Vocabulary

Types of Data

Worked Example 1

Worked Example 2

Primary and Secondary Data

Worked Example 3

Populations, Censuses and Samples

Worked Example 4

Sampling Methods

Simple random sampling

Worked Example 5

Worked Example 6

Stratified sampling [Higher]

Worked Example 7 [H]

Worked Example 8 [H]

Worked Example 9 [H]

Bias and How to Avoid It

Worked Example 10

Worked Example 11

Extended Worked Examples

Worked Example A [H] — Stratified sampling with rounding

Worked Example B — Designing an unbiased survey

Worked Example C — Choosing a sampling method

Answering at different grade levels

Common Mistakes and Misconceptions

Going Further

Exam Tips

Summary

More in Mathematics