You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
The Poisson distribution is the second great discrete model of Further Statistics. It counts the number of events in a fixed interval of time or space when events occur singly, independently and at a constant average rate. It is the natural model for "rare events in continuous opportunity" — calls to a switchboard, flaws in a roll of cloth, decays of a radioactive sample — and it is the bridge between the binomial (Lesson 3) and the chi-squared goodness-of-fit test (later in 3S).
This is core content of the Paper 3 Statistics option (7367/3S) (AO weighting AO1 40% / AO2 25% / AO3 35%). The arithmetic of P(X=r) is AO1; choosing and justifying a Poisson model in context is AO2; and the multi-step problems (changing the interval, combining sources, "at least one" complements, modelling decisions) are AO3. The prerequisite is the A-Level Mathematics binomial distribution and the index-laws / exponential function ex and its series.
| Condition | Meaning in context |
|---|---|
| Events occur singly | Two events cannot coincide at the same instant |
| Events occur independently | One occurrence does not change the chance of another |
| Constant average rate λ | The mean rate does not drift over the interval |
| Proportionality | P(one event in δt)≈λδt, and P(≥2 in δt) is negligible |
Typical Poisson variables: emails per hour, typing errors per page, cars past a point per minute, radioactive decays per second.
If X∼Po(λ) with parameter λ>0,
P(X=r)=r!e−λλr,r=0,1,2,…
λ is the mean number of events in the stated interval. The probabilities sum to 1 by the Maclaurin series eλ=∑r≥0λr/r!:
∑r=0∞r!e−λλr=e−λ∑r=0∞r!λr=e−λeλ=1.
| Property | Value |
|---|---|
| E(X) | λ |
| Var(X) | λ |
| SD(X) | λ |
The defining feature is E(X)=Var(X)=λ. We can prove the mean from the definition:
E(X)=r=0∑∞rr!e−λλr=e−λr=1∑∞(r−1)!λr=λe−λs=0∑∞s!λs=λe−λeλ=λ,substituting s=r−1. To obtain the variance we first find E(X(X−1)), the second factorial moment, which is cleaner than E(X2) for the Poisson because the r(r−1) factor cancels two terms of r!:
E(X(X−1))=r=0∑∞r(r−1)r!e−λλr=e−λr=2∑∞(r−2)!λr=λ2e−λs=0∑∞s!λs=λ2e−λeλ=λ2,substituting s=r−2. Since E(X(X−1))=E(X2)−E(X), we have E(X2)=λ2+λ, and therefore
Var(X)=E(X2)−(E(X))2=(λ2+λ)−λ2=λ.
So both the mean and the variance equal λ, confirmed from first principles. The equality of mean and variance is a powerful diagnostic: if sample data have a variance much larger than the mean (overdispersion, often caused by clustering) or much smaller (underdispersion, often caused by regularity), a Poisson model is doubtful.
If asked whether a Poisson model fits, compare the sample mean and sample variance; near-equality supports Poisson.
Examiners frequently ask not for a calculation but for a judgement: is the Poisson distribution an appropriate model for a stated situation? The decision turns on the four conditions, and each can fail in a recognisable way.
A worked judgement: "The number of misprints on a randomly chosen page of a long novel." Misprints plausibly occur singly (one wrong character at a time), independently (one typo does not make the next more likely), and at a roughly constant average rate across uniform pages — so a Poisson model is reasonable. A sensible caveat worth a mark: if the typesetter tired toward the end of long chapters, later pages might carry more errors, weakening the constant-rate assumption. This kind of "model, then critique" answer is exactly what the AO2 marks reward, and it distinguishes a top-band script from a merely correct one.
Computing many Poisson probabilities from scratch is wasteful, because each P(X=r)=e−λλr/r! recomputes a factorial and a power. A far quicker route uses the ratio of consecutive probabilities:
P(X=r)P(X=r+1)=e−λλr/r!e−λλr+1/(r+1)!=r+1λ,
which rearranges to the recurrence relation
P(X=r+1)=r+1λP(X=r),P(X=0)=e−λ.
Starting from P(X=0)=e−λ, each successive probability is obtained by a single multiplication. The recurrence also reveals the mode without any calculus: the probabilities increase while r+1λ>1, i.e. while r+1<λ, and decrease once r+1>λ. So the most likely value is the largest integer below λ; when λ is itself an integer, r+1λ=1 at r+1=λ, making P(X=λ−1)=P(X=λ) and the distribution bimodal.
The recurrence is also the cleanest way to compare a binomial with its Poisson approximation. Take X∼B(10,0.2), so np=2, versus Po(2). The exact binomial gives P(X=0)=0.810=0.1074, P(X=1)=10(0.2)(0.89)=0.2684, P(X=2)=(210)(0.2)2(0.88)=0.3020; the Poisson gives P(X=0)=e−2=0.1353, P(X=1)=2e−2=0.2707, P(X=2)=2e−2=0.2707. The two agree in broad shape but differ by a few percent because n=10 is small and p=0.2 is not tiny — a vivid illustration that the Poisson-to-binomial approximation of Lesson 3 improves as n grows and p shrinks. Tabulating both with the recurrence makes the convergence (or its absence) easy to see at a glance.
In practice the rate λ is rarely handed to you; it is estimated from data as the sample mean. This is the first step of the chi-squared goodness-of-fit test you will meet later in 3S, and it is worth rehearsing here. Suppose the number of telephone calls arriving at a small office in each of 50 one-minute intervals is recorded:
| Calls r | 0 | 1 | 2 | 3 | 4 |
|---|---|---|---|---|---|
| Frequency | 17 | 18 | 9 | 5 | 1 |
The sample mean is
xˉ=500(17)+1(18)+2(9)+3(5)+4(1)=500+18+18+15+4=5055=1.1,
so we would fit X∼Po(1.1). Before trusting the model, the mean-equals-variance diagnostic is worth a glance: the sample variance here is 50∑fr2−xˉ2=500+18+36+45+16−1.12=50115−1.21=2.3−1.21=1.09, reassuringly close to the mean 1.1 — consistent with a Poisson model. The fitted distribution then predicts, for example,
P(X=0)=e−1.1=0.3329,P(X≥2)=1−e−1.1(1+1.1)=1−0.3329(2.1)=1−0.6991=0.3009,
so over 50 intervals we would expect about 50×0.3329=16.6 intervals with no calls — strikingly close to the observed 17. Comparing such expected frequencies with the observed counts is exactly what the goodness-of-fit test formalises; estimating λ=xˉ here, and noting that doing so costs a degree of freedom later, is the conceptual bridge from this lesson to that test.
The Poisson distribution is one half of a beautiful pair. If events occur as a Poisson process — singly, independently, at constant rate λ per unit time — then the count of events in a fixed interval is Poisson, but the waiting time between consecutive events follows a different, continuous distribution: the exponential, which you will meet among the continuous distributions later in 3S. The two are linked by a single elegant observation. Let T be the time until the first event. Then "no event has occurred by time t" is the same as "the Poisson count over [0,t] is zero", so
P(T>t)=P(count in [0,t]=0)=e−λt,
because the count over an interval of length t is Po(λt) and P(Po(λt)=0)=e−λt. This is exactly the tail of the exponential distribution Exp(λ), whose mean waiting time is 1/λ — sensibly the reciprocal of the rate. So if calls arrive at λ=5 per hour, the average gap between calls is 1/5 hour, i.e. 12 minutes.
This duality is worth carrying forward because it deepens your grasp of both distributions and is a favourite source of synoptic exam questions. It also clarifies the meaning of the Poisson conditions: "constant rate" and "independence" are precisely the assumptions that make the gaps between events memoryless and exponentially distributed. A worked snippet: with λ=5 per hour, the probability of no calls in a given 12-minute window is P(Po(1)=0)=e−1=0.368 (using λt=5×6012=1), which is identical to P(T>0.2 h)=e−5(0.2)=e−1 — the count view and the waiting-time view agreeing exactly, as they must.
A subtle but examinable point concerns probabilities spanning separate independent intervals. Because disjoint intervals of a Poisson process are independent, the count in one window tells you nothing about the count in another, and joint probabilities multiply. Suppose accidents at a junction follow Po(2) per week, independently from week to week. The probability of no accidents in each of two consecutive weeks is the product
P(0 in week 1)×P(0 in week 2)=e−2×e−2=e−4=0.0183,
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.