You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
The probability generating function (PGF) packages an entire discrete distribution into a single power series GX(t)=E(tX). Once you have it, the whole distribution is at your fingertips: probabilities are its coefficients, the mean and variance come from its derivatives at t=1, and — most powerfully — the PGF of a sum of independent variables is just the product of their PGFs. This last property turns otherwise hard convolution problems (the distribution of X+Y) into simple multiplication, and gives slick proofs that, for instance, a sum of independent Poissons is Poisson.
This is Paper 3 optional content — Statistics (7367/3S), chosen with Mechanics (7367/3M) or Discrete (7367/3D). Paper 3 is 2 hours, 100 marks, AO1 40% / AO2 25% / AO3 35%. Deriving a PGF and differentiating it for E(X), Var(X) is AO1; the elegant "PGF of a sum = product of PGFs" proofs are AO2 (reasoning and proof); applying PGFs to a modelled scenario is AO3. It builds on A-Level Maths discrete distributions (binomial, Poisson, geometric) and the series/Maclaurin work of the pure modules — a PGF is a power series, and recovering probabilities is just reading off Maclaurin coefficients.
For a discrete random variable X taking non-negative integer values 0,1,2,…, the probability generating function is the expectation of tX:
GX(t)=E(tX)=∑r=0∞P(X=r)tr=P(X=0)+P(X=1)t+P(X=2)t2+⋯
This is a power series in the dummy variable t whose coefficient of tr is exactly P(X=r). Because the probabilities sum to 1, the series certainly converges for ∣t∣≤1, and GX(t) is finite there. The variable t has no meaning of its own — it is a bookkeeping device that lets calculus extract information about X.
For example, a variable with P(X=0)=0.2, P(X=1)=0.5, P(X=2)=0.3 has PGF GX(t)=0.2+0.5t+0.3t2 — you simply read the probabilities off as the coefficients, and conversely. A finite distribution gives a polynomial PGF; an infinite distribution (geometric, Poisson) gives an infinite series, which can often be summed into closed form. The whole power of the method is that operations on the random variable (taking expectations, adding independent copies) become algebraic operations on these series.
Because GX(t) is a power series with P(X=r) as the coefficient of tr, the probabilities are its Maclaurin coefficients — exactly the series machinery of the pure modules. Differentiating r times and setting t=0 isolates the rth coefficient:
P(X=r)=r!GX(r)(0).
In particular, evaluating low derivatives at t=0:
P(X=0)=GX(0),P(X=1)=GX′(0),P(X=2)=2GX′′(0).
Two universal checks follow immediately: GX(1)=∑rP(X=r)=1 always (the probabilities sum to one), and GX(0)=P(X=0) (only the constant term survives at t=0). Contrast the two special points: t=0 yields probabilities, t=1 yields moments — never confuse them.
| Distribution | PGF GX(t) |
|---|---|
| Bernoulli(p) | 1−p+pt=q+pt |
| B(n,p) | (q+pt)n |
| Geometric(p) (starting from 1) | 1−qtpt, $ |
| Po(λ) | eλ(t−1) |
Derivation for the binomial. A binomial is a sum of n independent Bernoulli(p) trials, each with PGF q+pt (since E(tX)=q⋅t0+p⋅t1). By the product rule (below), the binomial PGF is the n-fold product. Directly from the definition, using the binomial theorem:
GX(t)=∑r=0n(rn)prqn−rtr=∑r=0n(rn)(pt)rqn−r=(q+pt)n.
Derivation for the Poisson. Here the exponential series ex=∑r≥0xr/r! does the work:
GX(t)=∑r=0∞r!e−λλrtr=e−λ∑r=0∞r!(λt)r=e−λ⋅eλt=eλ(t−1).
These four PGFs (Bernoulli, binomial, geometric, Poisson) are worth knowing by heart; every other result in this lesson is obtained by differentiating or multiplying them. Notice the structural echoes: the Bernoulli q+pt is the binomial with n=1, and the binomial (q+pt)n is its nth power — a first glimpse of the product rule. The geometric's PGF is a rational function (an infinite series summed in closed form), reflecting its infinite support {1,2,…}; the Poisson's is a transcendental exponential, reflecting its infinite support {0,1,2,…} with rapidly decaying probabilities.
The factorial-moment idea. Each differentiation of GX brings down one more falling factor: GX(k)(1)=E(X(X−1)⋯(X−k+1)), the kth factorial moment. The mean uses k=1; the variance uses k=2 (then converts E(X(X−1)) to E(X2)). This is why G′′(1), not G′′(0), is the gateway to the variance.
E(X)=GX′(1).
Proof. Differentiate the series term by term (legitimate inside the radius of convergence):
GX′(t)=dtd∑r=0∞P(X=r)tr=∑r=1∞rP(X=r)tr−1
(the r=0 term vanishes on differentiating a constant). Setting t=1,
GX′(1)=∑r=1∞rP(X=r)=∑r=0∞rP(X=r)=E(X).
The factor r brought down by differentiation is exactly the weight in the expectation — which is why the derivative computes the mean.
Differentiating a second time brings down a factor r(r−1), so
GX′′(t)=∑r=2∞r(r−1)P(X=r)tr−2⇒GX′′(1)=∑rr(r−1)P(X=r)=E(X(X−1)).
Since E(X(X−1))=E(X2)−E(X), we recover the second moment and hence the variance:
E(X2)=GX′′(1)+GX′(1),
Var(X)=GX′′(1)+GX′(1)−(GX′(1))2
For X∼Po(λ), GX(t)=eλ(t−1). Differentiate (chain rule):
GX′(t)=λeλ(t−1)⇒GX′(1)=λe0=λ⇒E(X)=λ.(M1 differentiate; A1 E(X)=λ) GX′′(t)=λ2eλ(t−1)⇒GX′′(1)=λ2=E(X(X−1)).(M1 second derivative) Var(X)=GX′′(1)+GX′(1)−[GX′(1)]2=λ2+λ−λ2=λ.(A1)
Confirming the Poisson signature mean = variance = λ. (M1/A1 for the mean; M1 for G′′(1); A1 for the variance. Always evaluate the derivatives at t=1.)
Let X∼Geometric(p) on {1,2,3,…} with P(X=r)=qr−1p, q=1−p. Derive GX(t) and hence E(X).
GX(t)=∑r=1∞qr−1ptr=pt∑r=1∞(qt)r−1=1−qtpt,∣t∣<q1.(M1 geometric series; A1)
Differentiate by the quotient rule:
GX′(t)=(1−qt)2p(1−qt)−pt(−q)=(1−qt)2p.(M1) E(X)=GX′(1)=(1−q)2p=p2p=p1.(A1)
So the mean of a geometric is 1/p — exactly the familiar result, now derived from the PGF. (M1 summing the geometric series; A1 for GX(t); M1 quotient-rule differentiation; A1 for E(X)=1/p. A second derivative gives Var(X)=q/p2.)
For X∼B(n,p), GX(t)=(q+pt)n. Differentiate using the chain rule:
GX′(t)=n(q+pt)n−1⋅p=np(q+pt)n−1⇒GX′(1)=np(q+p)n−1=np,
since q+p=1. So E(X)=np, the familiar binomial mean. For the variance,
GX′′(t)=n(n−1)p2(q+pt)n−2⇒GX′′(1)=n(n−1)p2=E(X(X−1)), Var(X)=GX′′(1)+GX′(1)−[GX′(1)]2=n(n−1)p2+np−n2p2.
Expanding, n2p2−np2+np−n2p2=np−np2=np(1−p)=npq. Thus Var(X)=npq — the standard result, derived cleanly from one PGF.
If X and Y are independent, then E(tX+Y)=E(tXtY)=E(tX)E(tY), giving the key multiplicative rule:
GX+Y(t)=GX(t)GY(t).
This rests on the fact that for independent variables the expectation of a product factorises: E(tXtY)=E(tX)E(tY) precisely because independence lets the joint probabilities split. It extends to any number of independents: GX1+⋯+Xn(t)=∏iGXi(t).
The uniqueness theorem. Two non-negative-integer random variables with the same PGF have the same distribution. This is clear from the coefficient interpretation: if GX(t)=GY(t) as power series, then matching coefficients of tr forces P(X=r)=P(Y=r) for every r. Uniqueness is what licenses the final step of every proof below — once we recognise a product as a known PGF, we may conclude the sum has that distribution, not merely "a distribution with the same PGF." Together, the product rule and uniqueness form a proof machine for "sum of … is …" results.
Application 1 — sum of binomials. For independent X∼B(n1,p), Y∼B(n2,p) (same p),
GX+Y(t)=(q+pt)n1(q+pt)n2=(q+pt)n1+n2,
the PGF of B(n1+n2,p). By uniqueness X+Y∼B(n1+n2,p) — obvious from "combine the trials," but the PGF proves it instantly.
Application 2 — sum of Poissons. For independent X∼Po(λ), Y∼Po(μ),
GX+Y(t)=eλ(t−1)eμ(t−1)=e(λ+μ)(t−1),
the PGF of Po(λ+μ). Hence X+Y∼Po(λ+μ): a sum of independent Poissons is Poisson, with the rates adding. This is far slicker than the direct convolution P(X+Y=n)=∑k=0nP(X=k)P(Y=n−k), which would require summing a product of two Poisson terms and recognising the binomial theorem inside — the PGF makes the whole calculation a one-line multiplication of exponentials.
Application 3 — sum of geometrics. The sum of r independent Geometric(p) variables (each on {1,2,…}) has PGF
(1−qtpt)r,
which is the PGF of the negative binomial distribution (the number of trials to achieve the rth success). So "sum of r geometrics = negative binomial" drops out instantly — the discrete analogue of "sum of n exponentials = Gamma" from the continuous-distributions lesson.
| Feature | PGF GX(t) | MGF MX(t) |
|---|---|---|
| Definition | E(tX) | E(etX) |
| Applies to | Non-negative integer-valued X | Any X |
| Relationship | MX(t)=GX(et) | GX(t)=MX(lnt) |
| Mean | GX′(1) | MX′(0) |
| Probabilities | Coefficients of tr | Not directly available |
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.