Probability Models

This lesson covers the use of probability distributions — particularly the binomial and normal distributions — as models for real-world data. You will learn how to select an appropriate model, check its suitability, and compare model predictions with observed data from the large data set.

What Is a Probability Model?

A probability model is a mathematical description of a random process. It specifies the possible outcomes and the probability of each outcome. In A-Level Mathematics, the two most important probability models are:

Binomial distribution — for counting the number of successes in a fixed number of independent trials.
Normal distribution — for modelling continuous data that is symmetrically distributed around the mean.

The key idea is that we use these mathematical models to approximate real-world situations. No model is a perfect representation of reality, but a good model captures the essential features of the data and allows us to make useful predictions.

The Binomial Distribution as a Model

Conditions for the Binomial Model

The random variable $X$ follows a binomial distribution $X \sim B(n, p)$ if:

There is a fixed number of trials, $n$ .
Each trial has exactly two outcomes (success or failure).
The probability of success, $p$ , is constant for each trial.
The trials are independent.

Key Formulae

$P(X = r) = \binom{n}{r} p^r (1-p)^{n-r}$

$E(X) = np, \quad \text{Var}(X) = np(1-p)$

Modelling with the Binomial Distribution

Example from the LDS: Suppose historical data shows that it rains on approximately 40% of days in October at Camborne. If we select 10 random October days, we might model the number of rainy days as $X \sim B(10, 0.4)$ .

Checking the conditions:

Condition	Assessment
Fixed number of trials	Yes — 10 days
Two outcomes	Yes — rain or no rain (we need a clear definition of "rain", e.g., daily rainfall > 0.2 mm)
Constant probability	Approximately — the probability may vary slightly depending on the weather pattern, but 0.4 is a reasonable average
Independence	Approximately — weather on consecutive days is not truly independent (weather systems persist), but if the days are randomly selected from different years, independence is more reasonable

Model predictions vs observed data:

We could calculate $P(X = 0), P(X = 1), \ldots, P(X = 10)$ from the model and compare with the actual frequencies observed in the data set. If the model is a good fit, the predicted and observed frequencies should be similar.

When the Binomial Model Is Inappropriate

If the probability of success changes between trials (e.g., the chance of rain increases as the month progresses).
If the trials are not independent (e.g., consecutive days with the same weather system).
If the number of trials is not fixed.

The Normal Distribution as a Model

Properties of the Normal Distribution

The normal distribution $X \sim N(\mu, \sigma^2)$ is characterised by:

A bell-shaped, symmetric curve centred on the mean $\mu$ .
The mean, median, and mode are all equal.
Approximately 68% of values lie within 1 standard deviation of the mean, 95% within 2, and 99.7% within 3.

Modelling with the Normal Distribution

Many continuous variables in the large data set are approximately normally distributed:

Daily mean temperature at a given station during a specific month tends to follow a roughly normal distribution.
Daily mean pressure is often approximately normal.
Daily total sunshine may be skewed (especially in winter), so the normal model may not be appropriate.

Example: If the daily mean temperatures at Heathrow in July have a mean of 19.5°C and a standard deviation of 2.3°C, we might model the temperature as $X \sim N(19.5, 2.3^2)$ .

Checking Model Suitability

To assess whether a normal distribution is a suitable model for a set of data, consider:

Shape of the distribution: Plot a histogram or frequency polygon. Does it look roughly bell-shaped and symmetric?
Mean vs median: For a normal distribution, these should be approximately equal. A large difference suggests skewness.
68-95-99.7 rule: Check whether approximately 68% of the data lies within 1 standard deviation of the mean, 95% within 2, etc.
Outliers and skewness: A normal distribution has thin tails — if there are many extreme values or the distribution is clearly skewed, the normal model may not be appropriate.

Check	Normal model suitable	Normal model may not be suitable
Histogram shape	Roughly symmetric, bell-shaped	Clearly skewed or bimodal
Mean ≈ median	Yes	Large difference
68% rule	Approximately 68% within $\mu \pm \sigma$	Significantly more or fewer
Outliers	Few or none	Many extreme values

Example: Comparing Model with Observed Data

Suppose the model predicts that $P(X > 24) = 0.025$ for daily mean temperature at Heathrow in July (approximately 2.5% of days). If there are 31 days in July, the model predicts approximately $31 \times 0.025 \approx 0.8$ days with a mean temperature above 24°C. If the actual data shows 2 such days out of 31, the model's prediction is in the right ballpark — but with such a small sample, we would not expect exact agreement.

Comparing Model Predictions with Observed Data

The process of comparing model predictions with real data is fundamental to statistical modelling:

Step 1: Choose the Model

Select the appropriate distribution (binomial or normal) based on the type of data and the conditions.

Step 2: Estimate Parameters

Use the observed data to estimate the parameters:

For the binomial: estimate $p$ from the observed proportion of successes.
For the normal: estimate $\mu$ and $\sigma$ from the sample mean and standard deviation.

Step 3: Calculate Expected Frequencies

Use the model to predict the expected number of observations in each category or range.

For the binomial: Calculate $P(X = r)$ for each $r$ and multiply by the total number of observations.

For the normal: Calculate the probability of falling in each class interval and multiply by the total frequency.

Step 4: Compare

Compare the expected frequencies with the observed frequencies. A good model will produce expected frequencies that are close to the observed frequencies.

Step 5: Evaluate

If the model fits well, it can be used for prediction and inference. If not, consider:

Is a different distribution more appropriate?
Are the model assumptions violated?
Is the sample size too small for reliable comparison?

Combining Models: Normal Approximation to the Binomial

When $n$ is large and $p$ is not too close to 0 or 1, the binomial distribution can be approximated by the normal distribution:

$X \sim B(n, p) \approx N(np, np(1-p))$

The conditions for this approximation to be valid are:

$np > 5$ and $n(1-p) > 5$

When using this approximation, a continuity correction must be applied because we are approximating a discrete distribution with a continuous one:

Binomial probability	Normal approximation (with continuity correction)
$P(X \leq k)$	$P(Y \leq k + 0.5)$
$P(X < k)$	$P(Y < k - 0.5)$
$P(X \geq k)$	$P(Y \geq k - 0.5)$
$P(X > k)$	$P(Y > k + 0.5)$
$P(X = k)$	$P(k - 0.5 < Y < k + 0.5)$

Practical Application: Weather Data

Example 1: Binomial Model for Rain Days

Using the large data set, count the number of days with measurable rainfall (> 0.2 mm) at Hurn in September over several years. If the proportion is $\hat{p} = 0.45$ , model the number of rain days in a random sample of 30 September days as $X \sim B(30, 0.45)$ .

Predicted mean: $E(X) = 30 \times 0.45 = 13.5$ rain days. Observed mean from the data: compare and evaluate.

Example 2: Normal Model for Temperature

Daily mean temperatures at Leuchars in March: sample mean $\bar{x} = 5.8\,°C$ , sample standard deviation $s = 2.1\,°C$ . Model: $X \sim N(5.8, 2.1^2)$ .

Calculate $P(X < 2)$ from the model and compare with the proportion of March days in the data set where the temperature was below 2°C.

Summary

Probability models (binomial and normal) are used to approximate real-world data.
The binomial distribution models the number of successes in a fixed number of independent trials with constant probability.
The normal distribution models continuous, symmetric data.
Always check whether the model conditions are satisfied before applying a distribution.
Compare model predictions with observed data to evaluate model fit.
The normal distribution can approximate the binomial when $n$ is large and $np > 5$ and $n(1-p) > 5$ , with a continuity correction.
No model is perfect — always discuss the assumptions and limitations.

Exam Tip: When a question asks you to "comment on the suitability of a model", do not just say "it is suitable" or "it is not suitable." Explain why by checking the conditions (e.g., "The binomial model may not be fully appropriate because the probability of rain is likely to vary across the month, violating the constant probability condition. However, it provides a reasonable approximation for estimation purposes.")

A-Level Deep Dive: Probability Models in the LDS Context

Spec mapping

AQA 7357 specification, Paper 3 — Statistics, sub-strands N (Statistical distributions) and O (Statistical hypothesis testing), set within the Large Data Set context of section M covers the binomial distribution as a model; calculate probabilities using the binomial distribution. Understand and use the Normal distribution as a model; find probabilities using the Normal distribution. Select an appropriate probability distribution for a context, with appropriate reasoning, including recognising when the binomial or Normal model may not be appropriate (refer to the official specification document for exact wording). The LDS — daily weather observations from a number of UK and overseas weather stations — supplies the context: rainfall amounts, daily mean temperatures, sunshine hours, wind directions, and binary events such as "rain on a given day" are all routinely modelled probabilistically. The AQA formula booklet supplies neither the binomial probability mass function nor the Normal density; the binomial pmf must be memorised, and Normal probabilities must be looked up via the standard tables provided in the formula booklet for the standard Normal $Z = (X - \mu)/\sigma$ .

Worked example with full mark scheme

Question (8 marks):

A student investigates the LDS for Heathrow during May to October. Two scenarios:

(a) The student records, for each of the 184 days in the period, whether the daily total rainfall is at least $1\text{ mm}$ . Historical proportion suggests the long-run probability of such a "wet day" is $p = 0.30$ . Let $X$ be the number of wet days in a randomly chosen 14-day window from this period. Stating the conditions you assume, find $P(X \geq 5)$ . (5)

(b) The student then considers daily mean temperature $T\,(°\text{C})$ across the same period and proposes the model $T \sim N(15.4,\,3.2^2)$ . Using this model, find the probability that $T$ exceeds $18°\text{C}$ on a randomly chosen day, and comment on whether the binomial distribution would have been an appropriate alternative. (3)

Solution with mark scheme:

(a) Step 1 — state the model and conditions.

Let $X$ be the number of wet days in 14 days. Model $X \sim B(14, 0.30)$ provided:

each day is either "wet" or "not wet" (two outcomes per trial);
$p = 0.30$ is constant across the 14 days;
the 14 days are independent (rainfall on one day does not influence another);
the number of trials $n = 14$ is fixed in advance.

B1 — identifying the binomial model with parameters $n = 14$ , $p = 0.30$ and at least two named conditions (typically "fixed $n$ " and "constant $p$ " or "independence"). Examiners reward the condition explicitly written in context — e.g. "we assume rainfall on different days is independent". Stating only "binomial" earns nothing.

Step 2 — express the required probability.

$P(X \geq 5) = 1 - P(X \leq 4)$

M1 — using the complement to convert a tail probability into a cumulative probability that can be looked up or computed.

Step 3 — compute $P(X \leq 4)$ .

Using the binomial cdf with $n = 14, p = 0.30$ :

$P(X \leq 4) = \sum_{k=0}^{4} \binom{14}{k}(0.30)^k (0.70)^{14-k}$

Evaluating term by term and summing gives $P(X \leq 4) \approx 0.5842$ .

M1 — correct cumulative-binomial setup (calculator or table). The structural mark is for the sum from $k = 0$ to $k = 4$ ; numerical accuracy is rewarded separately.

A1 — $P(X \leq 4) \approx 0.5842$ to four decimal places (accept $0.584$ ).

Step 4 — answer.

$P(X \geq 5) = 1 - 0.5842 = 0.4158$

A1 — $P(X \geq 5) \approx 0.416$ (accept anywhere in the range $0.415$ to $0.417$ ).

(b) Step 1 — standardise.

$Z = \dfrac{T - \mu}{\sigma} = \dfrac{18 - 15.4}{3.2} = \dfrac{2.6}{3.2} = 0.8125$

M1 — correct standardisation. Watch the sign and the order: $(x - \mu)/\sigma$ , not $(\mu - x)/\sigma$ .

Step 2 — read the standard Normal table.

$P(Z < 0.8125) \approx 0.7917$ , so $P(T > 18) = 1 - 0.7917 = 0.2083$ .

A1 — $P(T > 18) \approx 0.208$ .

Step 3 — comparative comment.

The binomial model is not appropriate for daily mean temperature because temperature is a continuous quantity, not a count of successes in a fixed number of trials. A binomial random variable takes only the integer values $0, 1, 2, \dots, n$ , whereas $T$ varies on a continuum.

E1 — clear statement that temperature is continuous, the binomial models discrete counts, hence the binomial is inappropriate.

Total: 8 marks.

Specimen question modelled on the AQA 7357 Paper 3 format

Question (6 marks): The LDS records daily mean wind speed $W$ (knots) at Leeming. A student proposes $W \sim N(11.5,\,4.5^2)$ .

(a) Find $P(8 < W < 14)$ . (3)

(b) The Beaufort scale defines "moderate breeze" as wind speed strictly between $11$ and $16$ knots. Estimate the probability that, on a randomly chosen day, the wind speed is a "moderate breeze" according to this Beaufort definition under the proposed model. (3)

Mark scheme decomposition by AO:

(a)

M1 (AO1.1a) — standardising both endpoints: $Z_1 = (8 - 11.5)/4.5 = -0.7778$ and $Z_2 = (14 - 11.5)/4.5 = 0.5556$ .
M1 (AO1.1b) — using the symmetry $\Phi(-z) = 1 - \Phi(z)$ and computing $\Phi(0.5556) - \Phi(-0.7778) = \Phi(0.5556) - (1 - \Phi(0.7778))$ .
A1 (AO1.1b) — final answer $\approx 0.7104 - 0.2184 = 0.4920$ , accept $0.49$ to two decimal places.

(b)

M1 (AO1.1b) — standardising $Z_1 = (11 - 11.5)/4.5 = -0.1111$ and $Z_2 = (16 - 11.5)/4.5 = 1.0$ .
M1 (AO1.1b) — $P(11 < W < 16) = \Phi(1.0) - \Phi(-0.1111) = 0.8413 - 0.4558$ .
A1 (AO2.5) — $\approx 0.3855$ , accept $0.385$ to $0.386$ .

Total: 6 marks split AO1 = 5, AO2 = 1. AQA reserves the AO2 mark for the recognition that the strict-inequality boundary in (b) is irrelevant for a continuous distribution: $P(W = 11) = 0$ exactly under the Normal model.

Synoptic links

Connects to:

The binomial distribution. The natural binary-event LDS quantities are "rain on a given day", "daily maximum gust exceeds a threshold", "wind direction in a chosen sector". For each, modelling the count of successes in $n$ days as $B(n, p)$ is appropriate provided trials are independent and $p$ is constant. Independence is the assumption most often violated in real meteorological data: weather on consecutive days is correlated.
The Normal distribution. Continuous LDS measurements — daily mean temperature, daily mean wind speed, mean cloud cover, sunshine hours — are candidates for a Normal model when the marginal distribution is approximately symmetric and bell-shaped. Skewed quantities (rainfall amounts, which pile up at zero) usually fail the visual symmetry test.
Sampling and estimation. The LDS provides empirical estimates $\bar{x}$ and $s$ that play the role of $\mu$ and $\sigma$ in the proposed model. Awareness that estimates carry uncertainty is examined in section O (hypothesis testing for the mean of a Normal distribution with known variance).
Modelling assumptions. Both binomial and Normal models depend on stated assumptions; the AO3 (problem-solving) mark in extended LDS questions is awarded for naming and critically evaluating these assumptions.

Probability Models

Probability Models

What Is a Probability Model?

The Binomial Distribution as a Model

Conditions for the Binomial Model

Key Formulae

Modelling with the Binomial Distribution

When the Binomial Model Is Inappropriate

The Normal Distribution as a Model

Properties of the Normal Distribution

Modelling with the Normal Distribution

Checking Model Suitability

Example: Comparing Model with Observed Data

Comparing Model Predictions with Observed Data

Step 1: Choose the Model

Step 2: Estimate Parameters

Step 3: Calculate Expected Frequencies

Step 4: Compare

Step 5: Evaluate

Combining Models: Normal Approximation to the Binomial

Practical Application: Weather Data

Example 1: Binomial Model for Rain Days

Example 2: Normal Model for Temperature

Summary

A-Level Deep Dive: Probability Models in the LDS Context

Spec mapping

Worked example with full mark scheme

Specimen question modelled on the AQA 7357 Paper 3 format

Synoptic links

More in Mathematics