Binomial Distribution

This lesson covers the binomial distribution, one of the most important discrete probability distributions at A-Level. The binomial distribution models the number of successes in a fixed number of independent trials, each with the same probability of success.

Conditions for a Binomial Distribution

A random variable $X$ follows a binomial distribution, written $X \sim B(n, p)$ , if:

There is a fixed number of trials, $n$ .
Each trial has exactly two outcomes: success or failure.
The probability of success, $p$ , is constant for each trial.
The trials are independent of each other.

Exam Tip: When asked to justify a binomial model, you must state all four conditions and relate them to the context of the problem. Simply listing the conditions without context will not gain full marks.

The Binomial Probability Formula

The probability of exactly $r$ successes in $n$ trials is:

$P(X = r) = \binom{n}{r} p^r (1-p)^{n-r}$

where $\binom{n}{r} = \frac{n!}{r!(n-r)!}$ is the binomial coefficient (number of ways to choose $r$ items from $n$ ).

Example: A fair coin is tossed 8 times. Find $P(X = 3)$ where $X$ is the number of heads.

$X \sim B(8, 0.5)$

$P(X = 3) = \binom{8}{3} (0.5)^3 (0.5)^5 = 56 \times (0.5)^8 = 56 \times \frac{1}{256} = 0.2188$

Mean and Variance

For $X \sim B(n, p)$ :

$E(X) = np$ $\text{Var}(X) = np(1-p) = npq \quad \text{where } q = 1-p$ $\sigma = \sqrt{npq}$

Example: If $X \sim B(20, 0.3)$ :

$E(X) = 20 \times 0.3 = 6$
$\text{Var}(X) = 20 \times 0.3 \times 0.7 = 4.2$
$\sigma = \sqrt{4.2} \approx 2.049$

Cumulative Probabilities

To find $P(X \leq r)$ , sum individual probabilities:

$P(X \leq r) = \sum_{k=0}^{r} P(X = k)$

Use the cumulative binomial tables or your calculator's binomial CDF function.

Useful Relationships

Probability	Formula
$P(X \leq r)$	Cumulative probability (directly from tables)
$P(X < r)$	$P(X \leq r - 1)$
$P(X > r)$	$1 - P(X \leq r)$
$P(X \geq r)$	$1 - P(X \leq r - 1)$

Exam Tip: The most common error in binomial questions is confusing strict and non-strict inequalities. Always convert to $P(X \leq \text{something})$ form before using tables. Write out the conversion step explicitly.

Modelling Assumptions

When using the binomial distribution to model a real-world situation, consider:

Are the trials truly independent?
Is the probability of success truly constant?
Is the number of trials truly fixed?
Are there really only two outcomes?

If any condition is violated, the binomial model may not be appropriate.

Summary

$X \sim B(n, p)$ : fixed $n$ trials, constant probability $p$ , independent, two outcomes.
$P(X = r) = \binom{n}{r} p^r (1-p)^{n-r}$ .
$E(X) = np$ , $\text{Var}(X) = np(1-p)$ .
Use cumulative probabilities for $P(X \leq r)$ , $P(X > r)$ , etc.
Always verify the four conditions before applying the binomial model in context.

Exam Tip: When a question says "the probability that at least 3 ...", this means $P(X \geq 3) = 1 - P(X \leq 2)$ . Translating the words into the correct inequality is a crucial skill.

A-Level Deep Dive: The Binomial Distribution

Spec mapping

AQA 7357 specification, Paper 3 — Statistics, Section R covers the binomial distribution as a model; calculate probabilities using the binomial distribution; recognise the conditions under which the binomial distribution is an appropriate model (refer to the official specification document for exact wording). This sub-strand sits inside the Statistics half of Paper 3 (the other half being Mechanics) and is examined alongside Section O (probability), Section Q (statistical distributions, including the link to the normal distribution as an approximation) and Section S (hypothesis testing, where the binomial provides the exact test for a population proportion). The AQA formula booklet does provide $P(X = r) = \binom{n}{r} p^r (1 - p)^{n - r}$ and the mean / variance results, but candidates are still expected to identify when the model applies and to justify its use.

Worked example with full mark scheme

Question (8 marks):

A factory production line produces components, of which a long-run proportion $p = 0.15$ are defective. A quality inspector samples $n = 20$ components at random from a large batch.

(a) State two conditions that must hold for the binomial distribution $X \sim B(20, 0.15)$ to be an appropriate model for the number of defective components in the sample. (2)

(b) Find $P(X \leq 3)$ , giving your answer to 4 decimal places. (3)

(c) Find $E(X)$ and $\mathrm{Var}(X)$ , and hence the standard deviation of $X$ , giving the standard deviation to 3 significant figures. (3)

Solution with mark scheme:

(a) Step 1 — state two conditions.

Two of the following four (any pair earns the marks):

The trials are independent — selecting one component does not change the probability that the next is defective. This is reasonable here because the batch is large, so sampling without replacement is approximately equivalent to sampling with replacement.
The probability of "defective" is constant at $p = 0.15$ across all trials.
There are exactly two outcomes per trial (defective or not defective).
The number of trials $n = 20$ is fixed in advance.

B1 — first valid condition stated in context. B1 — second valid condition stated in context. Generic answers (" $n$ is fixed", " $p$ is constant") with no reference to the production-line context typically still earn the marks at A-Level, but examiners reward contextual phrasing.

(b) Step 2 — set up the cumulative probability.

$P(X \leq 3) = \sum_{r = 0}^{3} \binom{20}{r} (0.15)^r (0.85)^{20 - r}$

M1 — correct cumulative form, or equivalent calculator command binomcdf(20, 0.15, 3).

Step 3 — evaluate.

Term-by-term:

$P(X = 0) = (0.85)^{20} \approx 0.038760$
$P(X = 1) = 20 \cdot 0.15 \cdot (0.85)^{19} \approx 0.136798$
$P(X = 2) = \binom{20}{2}(0.15)^2(0.85)^{18} \approx 0.229396$
$P(X = 3) = \binom{20}{3}(0.15)^3(0.85)^{17} \approx 0.242773$

Sum: $0.038760 + 0.136798 + 0.229396 + 0.242773 \approx 0.6477$ .

M1 — at least three correct individual probabilities (or use of cumulative function).

A1 — $P(X \leq 3) \approx 0.6477$ to 4 d.p.

$E(X) = np = 20 \cdot 0.15 = 3$ .

$\mathrm{Var}(X) = np(1 - p) = 20 \cdot 0.15 \cdot 0.85 = 2.55$ .

$\mathrm{SD}(X) = \sqrt{2.55} \approx 1.597\ldots \approx 1.60$ (3 s.f.).

B1 — correct mean. B1 — correct variance. A1 — standard deviation to the requested accuracy.

Total: 8 marks (B2 M2 A2 B2, split as shown).

Specimen question modelled on the AQA 7357 Paper 3 format

Question (6 marks): A multiple-choice quiz has 10 questions, each with 4 options of which exactly one is correct. A student guesses every answer at random. Let $Y$ be the number of correct answers.

(a) Justify modelling $Y \sim B(10, 0.25)$ , stating each binomial condition in context. (3)

(b) Find the probability that the student gets at least 4 correct. (3)

Mark scheme decomposition by AO:

(a)

B1 (AO3.3) — fixed number of trials: 10 questions, set in advance.
B1 (AO3.3) — two outcomes per trial: each question is correct or incorrect.
B1 (AO3.3) — independent trials with constant $p = 0.25$ : random guessing means each question has probability $\tfrac{1}{4}$ of being correct, and one guess does not influence another.

(b)

M1 (AO1.1a) — recognising $P(Y \geq 4) = 1 - P(Y \leq 3)$ .
M1 (AO1.1b) — correct evaluation of $P(Y \leq 3)$ using $\binom{10}{r}(0.25)^r(0.75)^{10 - r}$ for $r = 0, 1, 2, 3$ .
A1 (AO1.1b) — $P(Y \geq 4) = 1 - 0.7759\ldots \approx 0.2241$ (4 d.p.).

Total: 6 marks split AO1 = 3, AO3 = 3. AQA's modelling AO (AO3) is heavily front-loaded on binomial questions: the first two or three marks almost always reward justifying the model rather than computing with it. Candidates who jump straight to numbers and skip the conditions concede the AO3 marks.

Synoptic links

Connects to:

Section O — Probability: the binomial coefficient $\binom{n}{r}$ counts the number of ways to choose $r$ "successes" from $n$ trials, which is the same combinatorial object that appears in tree-diagram and counting-rule problems. Every binomial probability is, fundamentally, an enumeration of equally likely outcome sequences weighted by their probabilities.
Section Q — Normal approximation: for large $n$ and $p$ not too close to 0 or 1 (rule of thumb $np > 5$ and $n(1 - p) > 5$ ), $B(n, p)$ is well-approximated by $N(np, np(1 - p))$ . This bridge between discrete and continuous distributions is the practical face of the Central Limit Theorem and lets candidates handle large- $n$ binomial probabilities by standard-normal table lookup.
Section J (Pure) — Binomial expansion: the algebraic identity $(p + q)^n = \sum_{r = 0}^{n} \binom{n}{r} p^r q^{n - r}$ is exactly the statement that binomial probabilities sum to 1 when $q = 1 - p$ . The AQA scheme deliberately reuses the symbol $\binom{n}{r}$ across both topics; the link is not coincidental.
Section S — Hypothesis testing: for testing a population proportion, the binomial distribution gives the exact null distribution of the sample count of successes. Critical regions for one- and two-tailed tests at the 5% level are read off from cumulative binomial tables; this is the canonical AQA hypothesis-test setup at AS level.
Section P — Modelling assumptions: every binomial application requires the candidate to articulate why the four conditions hold in context. This is the same modelling discipline tested across mechanics (point-particle, smooth surface, light string) and is increasingly the AO3 spine of statistics questions on the AQA paper.

Mark-scheme literacy

Binomial-distribution questions on 7357 split AO marks more evenly across all three than most pure topics:

AO	Typical share	Earned by
AO1 (knowledge / procedure)	50–60%	Computing $P(X = r)$ , $P(X \leq r)$ , mean $np$ and variance $np(1 - p)$ ; using calculator binomial functions correctly
AO2 (reasoning / interpretation)	15–25%	Recognising "at least one", "more than", "at most" and translating to the correct cumulative inequality; interpreting probabilities in context
AO3 (modelling / problem-solving)	20–30%	Justifying that the binomial conditions hold in context; criticising the model when they fail (e.g. dependence between trials, varying $p$ )

Examiner-rewarded phrasing: "Let $X$ be the number of defective components in the sample of 20, so $X \sim B(20, 0.15)$ assuming components are independent and identically defective with probability 0.15"; "since the batch is large, sampling without replacement is approximately equivalent to sampling with replacement, so independence is a reasonable assumption"; "we reject the null hypothesis at the 5% significance level". Phrases that lose marks: "the binomial works because there are two outcomes" with no further justification; computing a probability without first defining the random variable or stating its distribution; reading the cumulative table for the wrong tail.

A specific AQA pattern to watch: when the question says "find the probability that at least one component is defective", the standard manipulation is $P(X \geq 1) = 1 - P(X = 0) = 1 - (1 - p)^n$ . Candidates who try to sum $P(X = 1) + P(X = 2) + \ldots + P(X = n)$ directly waste exam time and frequently make arithmetic errors. The complement is always cheaper.

Grade-band model answers

3-mark question

Question: A fair six-sided die is rolled 8 times. Find the probability of obtaining exactly 2 sixes.

Grade C response (~180 words):

Let $X$ be the number of sixes. Each roll has probability $\tfrac{1}{6}$ of being a six, the rolls are independent, and there are 8 rolls, so $X \sim B(8, \tfrac{1}{6})$ .

$P(X = 2) = \binom{8}{2}(\tfrac{1}{6})^2(\tfrac{5}{6})^6 = 28 \cdot \tfrac{1}{36} \cdot \tfrac{15625}{46656}$ .

This gives $P(X = 2) \approx 0.2605$ .

Examiner commentary: Full marks (3/3). The candidate correctly identifies the distribution and parameters, applies the binomial probability formula, and produces a reasonable decimal answer. The structure — identify variable, state distribution, apply formula — is the standard procedure rewarded across binomial questions. Working is brief but every step is justifiable. A typical Grade C answer for a procedural question.

Grade A response (~210 words):*

Let $X$ denote the number of sixes obtained in 8 rolls of a fair die. The rolls are independent (the outcome of one roll does not affect the next), the probability of a six is constant at $p = \tfrac{1}{6}$ , the number of trials $n = 8$ is fixed, and each trial has two outcomes (six or not-six). Hence $X \sim B(8, \tfrac{1}{6})$ .

By the binomial probability formula:

$P(X = 2) = \binom{8}{2} \left(\dfrac{1}{6}\right)^2 \left(\dfrac{5}{6}\right)^6$

Computing: $\binom{8}{2} = 28$ , $(\tfrac{1}{6})^2 = \tfrac{1}{36}$ , $(\tfrac{5}{6})^6 = \tfrac{15625}{46656}$ .

$P(X = 2) = 28 \cdot \dfrac{1}{36} \cdot \dfrac{15625}{46656} = \dfrac{28 \cdot 15625}{36 \cdot 46656} \approx 0.2605$

Binomial Distribution

Binomial Distribution

Conditions for a Binomial Distribution

The Binomial Probability Formula

Mean and Variance

Cumulative Probabilities

Useful Relationships

Modelling Assumptions

Summary

A-Level Deep Dive: The Binomial Distribution

Spec mapping

Worked example with full mark scheme

Specimen question modelled on the AQA 7357 Paper 3 format

Synoptic links

Mark-scheme literacy

Grade-band model answers

3-mark question

More in Mathematics