The Poisson Distribution

The Poisson distribution is the second great discrete model of Further Statistics. It counts the number of events in a fixed interval of time or space when events occur singly, independently and at a constant average rate. It is the natural model for "rare events in continuous opportunity" — calls to a switchboard, flaws in a roll of cloth, decays of a radioactive sample — and it is the bridge between the binomial (Lesson 3) and the chi-squared goodness-of-fit test (later in 3S).

1. Where this sits in AQA 7367

This is core content of the Paper 3 Statistics option (7367/3S) (AO weighting AO1 40% / AO2 25% / AO3 35%). The arithmetic of $P(X=r)$ is AO1; choosing and justifying a Poisson model in context is AO2; and the multi-step problems (changing the interval, combining sources, "at least one" complements, modelling decisions) are AO3. The prerequisite is the A-Level Mathematics binomial distribution and the index-laws / exponential function $e^x$ and its series.

2. Core theory

When the Poisson model applies

Condition	Meaning in context
Events occur singly	Two events cannot coincide at the same instant
Events occur independently	One occurrence does not change the chance of another
Constant average rate $\lambda$	The mean rate does not drift over the interval
Proportionality	$P(\text{one event in }\delta t) \approx \lambda\,\delta t$ , and $P(\geq 2 \text{ in }\delta t)$ is negligible

Typical Poisson variables: emails per hour, typing errors per page, cars past a point per minute, radioactive decays per second.

The probability function

If $X \sim \text{Po}(\lambda)$ with parameter $\lambda > 0$ ,

$P(X = r) = \frac{e^{-\lambda}\lambda^r}{r!}, \qquad r = 0, 1, 2, \ldots$

$\lambda$ is the mean number of events in the stated interval. The probabilities sum to 1 by the Maclaurin series $e^{\lambda} = \sum_{r\geq 0}\lambda^r/r!$ :

$\sum_{r=0}^{\infty}\frac{e^{-\lambda}\lambda^r}{r!} = e^{-\lambda}\sum_{r=0}^{\infty}\frac{\lambda^r}{r!} = e^{-\lambda}e^{\lambda} = 1.$

Mean and variance — and why they are equal

Property	Value
$E(X)$	$\lambda$
$\text{Var}(X)$	$\lambda$
$\text{SD}(X)$	$\sqrt{\lambda}$

The defining feature is $E(X) = \text{Var}(X) = \lambda$ . We can prove the mean from the definition:

\begin{aligned} E(X) &= \sum_{r=0}^{\infty} r\,\frac{e^{-\lambda}\lambda^r}{r!} = e^{-\lambda}\sum_{r=1}^{\infty}\frac{\lambda^r}{(r-1)!} \\ &= \lambda e^{-\lambda}\sum_{s=0}^{\infty}\frac{\lambda^{s}}{s!} = \lambda e^{-\lambda}e^{\lambda} = \lambda, \end{aligned}

substituting $s = r-1$ . To obtain the variance we first find $E\big(X(X-1)\big)$ , the second factorial moment, which is cleaner than $E(X^2)$ for the Poisson because the $r(r-1)$ factor cancels two terms of $r!$ :

\begin{aligned} E\big(X(X-1)\big) &= \sum_{r=0}^{\infty} r(r-1)\,\frac{e^{-\lambda}\lambda^r}{r!} = e^{-\lambda}\sum_{r=2}^{\infty}\frac{\lambda^r}{(r-2)!} \\ &= \lambda^2 e^{-\lambda}\sum_{s=0}^{\infty}\frac{\lambda^{s}}{s!} = \lambda^2 e^{-\lambda}e^{\lambda} = \lambda^2, \end{aligned}

substituting $s = r-2$ . Since $E\big(X(X-1)\big) = E(X^2) - E(X)$ , we have $E(X^2) = \lambda^2 + \lambda$ , and therefore

$\text{Var}(X) = E(X^2) - \big(E(X)\big)^2 = (\lambda^2 + \lambda) - \lambda^2 = \lambda.$

So both the mean and the variance equal $\lambda$ , confirmed from first principles. The equality of mean and variance is a powerful diagnostic: if sample data have a variance much larger than the mean (overdispersion, often caused by clustering) or much smaller (underdispersion, often caused by regularity), a Poisson model is doubtful.

If asked whether a Poisson model fits, compare the sample mean and sample variance; near-equality supports Poisson.

Choosing a Poisson model in practice

Examiners frequently ask not for a calculation but for a judgement: is the Poisson distribution an appropriate model for a stated situation? The decision turns on the four conditions, and each can fail in a recognisable way.

Singly. Events must occur one at a time. Counting "people entering a shop" is fine when they arrive individually, but "cars arriving" can fail if vehicles travel in convoys (several arrive together), breaching the singly condition.
Independently. One occurrence must not change the chance of another. Goals in football are not well modelled by a pure Poisson if a team that scores then attacks more (or sits back) — the occurrences become dependent. Radioactive decays, by contrast, are genuinely independent.
Constant rate. The mean rate must not drift. Calls to a helpline over a whole day violate this (a lunchtime peak), but calls within a single quiet hour may be acceptably constant. When the rate varies, the count over the whole period is not Poisson, although it may be Poisson over a short sub-interval.
Counts in a fixed interval. The variable must count occurrences in a fixed window of time, length, area or volume — "flaws per square metre of sheet metal", "weeds per square metre of lawn".

A worked judgement: "The number of misprints on a randomly chosen page of a long novel." Misprints plausibly occur singly (one wrong character at a time), independently (one typo does not make the next more likely), and at a roughly constant average rate across uniform pages — so a Poisson model is reasonable. A sensible caveat worth a mark: if the typesetter tired toward the end of long chapters, later pages might carry more errors, weakening the constant-rate assumption. This kind of "model, then critique" answer is exactly what the AO2 marks reward, and it distinguishes a top-band script from a merely correct one.

The recurrence relation and efficient tabulation

Computing many Poisson probabilities from scratch is wasteful, because each $P(X = r) = e^{-\lambda}\lambda^r/r!$ recomputes a factorial and a power. A far quicker route uses the ratio of consecutive probabilities:

$\frac{P(X = r+1)}{P(X = r)} = \frac{e^{-\lambda}\lambda^{r+1}/(r+1)!}{e^{-\lambda}\lambda^{r}/r!} = \frac{\lambda}{r+1},$

which rearranges to the recurrence relation

$P(X = r+1) = \frac{\lambda}{r+1}\,P(X = r), \qquad P(X = 0) = e^{-\lambda}.$

Starting from $P(X=0) = e^{-\lambda}$ , each successive probability is obtained by a single multiplication. The recurrence also reveals the mode without any calculus: the probabilities increase while $\frac{\lambda}{r+1} > 1$ , i.e. while $r + 1 < \lambda$ , and decrease once $r + 1 > \lambda$ . So the most likely value is the largest integer below $\lambda$ ; when $\lambda$ is itself an integer, $\frac{\lambda}{r+1} = 1$ at $r + 1 = \lambda$ , making $P(X = \lambda - 1) = P(X = \lambda)$ and the distribution bimodal.

The recurrence is also the cleanest way to compare a binomial with its Poisson approximation. Take $X \sim B(10, 0.2)$ , so $np = 2$ , versus $\text{Po}(2)$ . The exact binomial gives $P(X = 0) = 0.8^{10} = 0.1074$ , $P(X = 1) = 10(0.2)(0.8^9) = 0.2684$ , $P(X = 2) = \binom{10}{2}(0.2)^2(0.8^8) = 0.3020$ ; the Poisson gives $P(X = 0) = e^{-2} = 0.1353$ , $P(X = 1) = 2e^{-2} = 0.2707$ , $P(X = 2) = 2e^{-2} = 0.2707$ . The two agree in broad shape but differ by a few percent because $n = 10$ is small and $p = 0.2$ is not tiny — a vivid illustration that the Poisson-to-binomial approximation of Lesson 3 improves as $n$ grows and $p$ shrinks. Tabulating both with the recurrence makes the convergence (or its absence) easy to see at a glance.

From data to a fitted model

In practice the rate $\lambda$ is rarely handed to you; it is estimated from data as the sample mean. This is the first step of the chi-squared goodness-of-fit test you will meet later in 3S, and it is worth rehearsing here. Suppose the number of telephone calls arriving at a small office in each of 50 one-minute intervals is recorded:

Calls $r$	0	1	2	3	4
Frequency	17	18	9	5	1

The sample mean is

$\bar{x} = \frac{0(17) + 1(18) + 2(9) + 3(5) + 4(1)}{50} = \frac{0 + 18 + 18 + 15 + 4}{50} = \frac{55}{50} = 1.1,$

so we would fit $X \sim \text{Po}(1.1)$ . Before trusting the model, the mean-equals-variance diagnostic is worth a glance: the sample variance here is $\frac{\sum f r^2}{50} - \bar{x}^2 = \frac{0 + 18 + 36 + 45 + 16}{50} - 1.1^2 = \frac{115}{50} - 1.21 = 2.3 - 1.21 = 1.09$ , reassuringly close to the mean $1.1$ — consistent with a Poisson model. The fitted distribution then predicts, for example,

$P(X = 0) = e^{-1.1} = 0.3329, \qquad P(X \geq 2) = 1 - e^{-1.1}(1 + 1.1) = 1 - 0.3329(2.1) = 1 - 0.6991 = 0.3009,$

so over 50 intervals we would expect about $50 \times 0.3329 = 16.6$ intervals with no calls — strikingly close to the observed 17. Comparing such expected frequencies with the observed counts is exactly what the goodness-of-fit test formalises; estimating $\lambda = \bar{x}$ here, and noting that doing so costs a degree of freedom later, is the conceptual bridge from this lesson to that test.

The Poisson process and waiting times

The Poisson distribution is one half of a beautiful pair. If events occur as a Poisson process — singly, independently, at constant rate $\lambda$ per unit time — then the count of events in a fixed interval is Poisson, but the waiting time between consecutive events follows a different, continuous distribution: the exponential, which you will meet among the continuous distributions later in 3S. The two are linked by a single elegant observation. Let $T$ be the time until the first event. Then "no event has occurred by time $t$ " is the same as "the Poisson count over $[0,t]$ is zero", so

$P(T > t) = P(\text{count in } [0,t] = 0) = e^{-\lambda t},$

because the count over an interval of length $t$ is $\text{Po}(\lambda t)$ and $P(\text{Po}(\lambda t) = 0) = e^{-\lambda t}$ . This is exactly the tail of the exponential distribution $\text{Exp}(\lambda)$ , whose mean waiting time is $1/\lambda$ — sensibly the reciprocal of the rate. So if calls arrive at $\lambda = 5$ per hour, the average gap between calls is $1/5$ hour, i.e. 12 minutes.

This duality is worth carrying forward because it deepens your grasp of both distributions and is a favourite source of synoptic exam questions. It also clarifies the meaning of the Poisson conditions: "constant rate" and "independence" are precisely the assumptions that make the gaps between events memoryless and exponentially distributed. A worked snippet: with $\lambda = 5$ per hour, the probability of no calls in a given 12-minute window is $P(\text{Po}(1) = 0) = e^{-1} = 0.368$ (using $\lambda t = 5\times\tfrac{12}{60} = 1$ ), which is identical to $P(T > 0.2\text{ h}) = e^{-5(0.2)} = e^{-1}$ — the count view and the waiting-time view agreeing exactly, as they must.

Combining independent periods

A subtle but examinable point concerns probabilities spanning separate independent intervals. Because disjoint intervals of a Poisson process are independent, the count in one window tells you nothing about the count in another, and joint probabilities multiply. Suppose accidents at a junction follow $\text{Po}(2)$ per week, independently from week to week. The probability of no accidents in each of two consecutive weeks is the product

$P(\text{0 in week 1}) \times P(\text{0 in week 2}) = e^{-2}\times e^{-2} = e^{-4} = 0.0183,$

The Poisson Distribution

The Poisson Distribution

1. Where this sits in AQA 7367

2. Core theory

When the Poisson model applies

The probability function

Mean and variance — and why they are equal

Choosing a Poisson model in practice

The recurrence relation and efficient tabulation

From data to a fitted model

The Poisson process and waiting times

Combining independent periods

More in Mathematics