You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Every piece of statistics work begins with two questions: what kind of data am I dealing with? and where did it come from? Getting these right is the foundation of the whole Statistics strand of OCR GCSE Mathematics (J560). If you misclassify your data or collect it from a biased sample, every average, chart and conclusion that follows will be unreliable. This lesson builds the vocabulary and reasoning you need before you touch a single calculation.
This topic is assessed across both the calculator and non-calculator OCR papers. It is mostly AO1 (knowing the definitions of data types and sampling methods) and AO2 (reasoning about why a sample might be biased and how to fix it). OCR likes short "Describe" and "Give a reason for your answer" questions here, so precise written explanations earn the marks, not just one-word labels.
| Term | Meaning |
|---|---|
| Population | The entire group you want to find out about (for example, every Year 11 student in a school) |
| Sample | A smaller group chosen from the population to represent it |
| Census | Collecting data from every member of the population |
| Primary data | Data you collect yourself, first-hand |
| Secondary data | Data collected by someone else that you reuse |
| Qualitative data | Non-numerical data describing a quality or category |
| Quantitative data | Numerical data that is counted or measured |
| Discrete data | Quantitative data taking only separate, fixed values (you count it) |
| Continuous data | Quantitative data taking any value in a range (you measure it) |
| Bias | When a sample does not fairly represent the population |
| Random sample | A sample where every member has an equal chance of selection |
| Stratified sample [H] | A sample taken in proportion from each group (stratum) of the population |
The first split is between qualitative and quantitative data.
Quantitative data then splits again into discrete and continuous:
A useful test: ask "could a value sensibly sit between two of my readings?" If a value of 17.6 is meaningful, the data is continuous; if only whole steps make sense, it is discrete.
Classify each variable as qualitative or quantitative, and if quantitative, as discrete or continuous. (a) The colour of cars in a car park. (b) The number of pages in library books. (c) The mass of apples in a crate. (d) The temperature of a cup of tea every minute.
Solution:
Common error: writing "shoe size is continuous" because of the halves. Shoe sizes jump in fixed steps, so they are discrete.
State whether each is discrete or continuous: (a) the time taken to download a file; (b) the number of students absent each day; (c) the length of a leaf; (d) a GCSE grade on the 9–1 scale.
Solution:
Primary data is data you collect yourself — through a survey, an experiment or by observation. It is tailored exactly to your question and you control its accuracy, but it takes time and money to gather.
Secondary data is data someone else has already collected that you reuse — from a website, a newspaper, a government database or a textbook. It is quick and cheap to obtain and can give you very large data sets, but it may not match your question precisely, may be out of date, and you cannot be sure how carefully it was collected.
Jordan wants to know the average daily rainfall in his town last year. He finds the figures on the Met Office website. Is this primary or secondary data? Give one advantage and one disadvantage of using it.
Solution: It is secondary data — Jordan did not measure the rainfall himself.
The population is the whole group you are interested in. A census collects data from every single member of that population. A sample is a smaller group chosen to stand in for the whole population.
Why not always take a census? Because it is often:
A good sample is representative: it reflects the make-up of the population. The bigger and more carefully chosen the sample, the more reliable your conclusions — but a larger sample also costs more, so there is a trade-off.
A company makes 50,000 light bulbs a day and wants to know how long they last on average. Explain why the company should use a sample rather than a census.
Solution: Testing how long a bulb lasts means running it until it fails, which destroys it. A census would destroy all 50,000 bulbs, leaving none to sell. A sample lets the company estimate the average lifetime while keeping the rest of the stock to sell.
In a simple random sample, every member of the population has an equal chance of being chosen, and choices do not influence each other. The standard method is:
This removes selection bias because no person or group is favoured. Its drawback is that you need a complete numbered list of the whole population (a sampling frame), and by chance a small random sample can still miss out parts of the population.
A youth club has 240 members. The manager numbers them 001 to 240 and uses a random number generator to choose 30 for a survey. (a) What is the probability that a particular member is chosen? (b) Explain why this is a fair method.
Solution: (a) P(chosen)=24030=81. (b) Every member has the same probability 81 of being selected, and no group is favoured, so the sample is unbiased.
Explain one practical difficulty with taking a simple random sample of all the shoppers who use a supermarket in a year.
Solution: There is no complete numbered list of every shopper for the year (no sampling frame), so you cannot give each shopper a number to select from. Without a full list you cannot guarantee every shopper has an equal chance, so true simple random sampling is impractical here.
When a population is made of clearly different groups (strata) — such as year groups, departments or age bands — a stratified sample takes from each group in proportion to its size. This guarantees that small groups are not under-represented. The number taken from each stratum is:
number from a stratum=total populationsize of stratum×sample size
Within each stratum, members are then chosen by simple random sampling.
A sixth-form college has students spread across three subjects. A stratified sample of 60 students is required.
| Subject | Number of students |
|---|---|
| Sciences | 360 |
| Arts | 240 |
| Sport | 120 |
| Total | 720 |
Work out how many students should be sampled from each subject.
Solution: The sampling fraction is 72060=121.
Common error: dividing 60 by 3 to get 20 from each subject. That ignores the different group sizes and over-samples Sport while under-sampling Sciences.
A gym has 900 members: 540 adults, 270 students and 90 children. A stratified sample of 50 is taken. How many children are in the sample?
Solution: Number of children =90090×50=0.1×50=5 children.
In a stratified sample of 40 from 800 employees, 6 employees came from the night shift. How many night-shift employees are there altogether?
Solution: The sampling fraction is 80040=201. If 6 in the sample represent the night shift, then 6=201×(night-shift total), so the night-shift total =6×20=120 employees.
A sample is biased when it systematically over- or under-represents part of the population. Common causes:
| Cause of bias | Example |
|---|---|
| Non-random selection | Only asking your own friends |
| Wrong time or place | Surveying a high street at 11 a.m. misses people at work |
| Too small a sample | A handful of people is unlikely to reflect everyone |
| Leading questions | "Don't you agree the canteen food is poor?" pushes a "yes" |
| Non-response | People who ignore a survey may differ from those who reply |
| Self-selection | Only people who feel strongly bother to respond |
Mia wants to know what students across her whole school think of the new timetable. She asks the 28 students in her own form. Give two reasons why this may not give a representative sample, and suggest a better method.
Solution:
Better method: take a stratified random sample across all year groups, so each year is represented in proportion to its size.
A radio station asks listeners to phone in to vote on whether a new bypass should be built. Give one reason why the result may be biased.
Solution: Only listeners who feel strongly enough to phone in will respond (self-selection), and only people who happen to listen to that station are reached. These groups may not represent the views of the whole local population, so the result is biased.
A school has 1,150 students: 250 in Year 9, 230 in Year 10, 220 in Year 11, 230 in Year 12 and 220 in Year 13. A stratified sample of 60 is required. Work out how many to sample from each year group.
Step 1 — sampling fraction: 115060=0.05217…
Step 2 — apply to each stratum (round to the nearest whole number):
| Year | Population | Calculation | Sample |
|---|---|---|---|
| 9 | 250 | 250×0.05217=13.04 | 13 |
| 10 | 230 | 230×0.05217=12.00 | 12 |
| 11 | 220 | 220×0.05217=11.48 | 11 |
| 12 | 230 | 230×0.05217=12.00 | 12 |
| 13 | 220 | 220×0.05217=11.48 | 11 |
Step 3 — check the total: 13+12+11+12+11=59. This is one short because of rounding down. Add one to the stratum whose value was closest to rounding up (Year 11 or Year 13, both 11.48); adding one to Year 11 gives 12 and a total of 60. Always finish by checking the strata add to the required sample size.
A council wants residents' opinions on weekend parking charges. A researcher stands outside one car park at 10 a.m. on a Wednesday and asks the first 40 drivers. Identify two sources of bias and describe a better approach.
Solution:
Better approach: take a stratified random sample from the electoral roll, grouping residents by, for example, age band and area, so every type of resident is represented in proportion. Send the survey by post or online with reminders to reduce non-response.
A factory making 12,000 phone cases a day wants to monitor quality. Suggest a suitable sampling method and explain why a census is unsuitable.
Solution: A systematic-style random sample — for instance testing a randomly chosen case and then every 200th case afterwards — is practical and spreads checks across the whole day's production. A census is unsuitable because inspecting all 12,000 cases every day would be far too slow and expensive, and any destructive test (such as a drop test) would ruin saleable stock.
Specimen question modelled on the OCR J560 paper format (Higher, 4 marks): A leisure centre has 1,600 members made up of 800 adults, 600 students and 200 children. A stratified sample of 80 members is taken. Work out how many of each type are in the sample and explain why stratified sampling is appropriate here.
Grades 3–4 response: "80÷1600=0.05. Adults 800×0.05=40, students 600×0.05=30, children 200×0.05=10." Examiner-style commentary: correct figures earn the method and accuracy marks, but with no explanation the reasoning mark is lost.
Grades 5–6 response: "Fraction =160080=201. Adults =40, students =30, children =10 (check 40+30+10=80). Stratified sampling keeps the right proportion of each type." Examiner-style commentary: correct working, a check, and a basic reason — close to full marks.
Grades 7–9 response: "Using fraction 160080=201: adults 40, students 30, children 10 (total 80 ✓). Stratified sampling is appropriate because the three groups are very different in size; a simple random sample of 80 could, by chance, contain too few children and so under-represent them. Sampling each group in proportion guarantees fair representation, and members within each stratum are then chosen at random to avoid bias." Examiner-style commentary: full marks — correct values, a check, and precise justification using stratum, proportion and bias.
The sampling ideas you meet here scale up to professional statistics. Opinion pollsters use quota and stratified sampling to predict elections from samples of around 1,000 people; the smaller the sample, the wider the margin of error they must quote. Ecologists estimating animal populations cannot list every animal, so they use capture–recapture: tag a first sample, release it, then see what fraction of a later sample carries tags. Recognising why full censuses are rare, and how careful sampling controls bias, is exactly the reasoning that underpins real-world data science, medical trials and quality control in industry.
This content is aligned with the OCR GCSE Mathematics (J560) specification.