Probability Generating Functions

The probability generating function (PGF) packages an entire discrete distribution into a single power series $G_X(t) = E(t^X)$ . Once you have it, the whole distribution is at your fingertips: probabilities are its coefficients, the mean and variance come from its derivatives at $t = 1$ , and — most powerfully — the PGF of a sum of independent variables is just the product of their PGFs. This last property turns otherwise hard convolution problems (the distribution of $X + Y$ ) into simple multiplication, and gives slick proofs that, for instance, a sum of independent Poissons is Poisson.

Where this sits in AQA 7367

This is Paper 3 optional content — Statistics (7367/3S), chosen with Mechanics (7367/3M) or Discrete (7367/3D). Paper 3 is 2 hours, 100 marks, AO1 40% / AO2 25% / AO3 35%. Deriving a PGF and differentiating it for $E(X)$ , $\operatorname{Var}(X)$ is AO1; the elegant "PGF of a sum = product of PGFs" proofs are AO2 (reasoning and proof); applying PGFs to a modelled scenario is AO3. It builds on A-Level Maths discrete distributions (binomial, Poisson, geometric) and the series/Maclaurin work of the pure modules — a PGF is a power series, and recovering probabilities is just reading off Maclaurin coefficients.

Definition

For a discrete random variable $X$ taking non-negative integer values $0, 1, 2, \ldots$ , the probability generating function is the expectation of $t^X$ :

$G_X(t) = E(t^X) = \sum_{r=0}^{\infty} P(X = r)\, t^r = P(X{=}0) + P(X{=}1)\,t + P(X{=}2)\,t^2 + \cdots$

This is a power series in the dummy variable $t$ whose coefficient of $t^r$ is exactly $P(X = r)$ . Because the probabilities sum to $1$ , the series certainly converges for $|t| \le 1$ , and $G_X(t)$ is finite there. The variable $t$ has no meaning of its own — it is a bookkeeping device that lets calculus extract information about $X$ .

For example, a variable with $P(X = 0) = 0.2$ , $P(X = 1) = 0.5$ , $P(X = 2) = 0.3$ has PGF $G_X(t) = 0.2 + 0.5t + 0.3t^2$ — you simply read the probabilities off as the coefficients, and conversely. A finite distribution gives a polynomial PGF; an infinite distribution (geometric, Poisson) gives an infinite series, which can often be summed into closed form. The whole power of the method is that operations on the random variable (taking expectations, adding independent copies) become algebraic operations on these series.

Recovering Probabilities

Because $G_X(t)$ is a power series with $P(X = r)$ as the coefficient of $t^r$ , the probabilities are its Maclaurin coefficients — exactly the series machinery of the pure modules. Differentiating $r$ times and setting $t = 0$ isolates the $r$ th coefficient:

$P(X = r) = \frac{G_X^{(r)}(0)}{r!}.$

In particular, evaluating low derivatives at $t = 0$ :

$P(X = 0) = G_X(0), \qquad P(X = 1) = G_X'(0), \qquad P(X = 2) = \frac{G_X''(0)}{2}.$

Two universal checks follow immediately: $G_X(1) = \sum_r P(X = r) = 1$ always (the probabilities sum to one), and $G_X(0) = P(X = 0)$ (only the constant term survives at $t = 0$ ). Contrast the two special points: $t = 0$ yields probabilities, $t = 1$ yields moments — never confuse them.

Standard PGFs

Distribution	PGF $G_X(t)$
$\text{Bernoulli}(p)$	$1 - p + pt = q + pt$
$B(n, p)$	$(q + pt)^n$
$\text{Geometric}(p)$ (starting from 1)	$\frac{pt}{1 - qt}$ , $
$\text{Po}(\lambda)$	$e^{\lambda(t - 1)}$

Derivation for the binomial. A binomial is a sum of $n$ independent Bernoulli $(p)$ trials, each with PGF $q + pt$ (since $E(t^X) = q\cdot t^0 + p\cdot t^1$ ). By the product rule (below), the binomial PGF is the $n$ -fold product. Directly from the definition, using the binomial theorem:

$G_X(t) = \sum_{r=0}^{n}\binom{n}{r}p^r q^{n-r}\, t^r = \sum_{r=0}^{n}\binom{n}{r}(pt)^r q^{n-r} = (q + pt)^n.$

Derivation for the Poisson. Here the exponential series $e^x = \sum_{r\ge 0} x^r/r!$ does the work:

$G_X(t) = \sum_{r=0}^{\infty} \frac{e^{-\lambda}\lambda^r}{r!}\, t^r = e^{-\lambda} \sum_{r=0}^{\infty} \frac{(\lambda t)^r}{r!} = e^{-\lambda}\cdot e^{\lambda t} = e^{\lambda(t-1)}.$

These four PGFs (Bernoulli, binomial, geometric, Poisson) are worth knowing by heart; every other result in this lesson is obtained by differentiating or multiplying them. Notice the structural echoes: the Bernoulli $q + pt$ is the binomial with $n = 1$ , and the binomial $(q+pt)^n$ is its $n$ th power — a first glimpse of the product rule. The geometric's PGF is a rational function (an infinite series summed in closed form), reflecting its infinite support $\{1, 2, \ldots\}$ ; the Poisson's is a transcendental exponential, reflecting its infinite support $\{0, 1, 2, \ldots\}$ with rapidly decaying probabilities.

The factorial-moment idea. Each differentiation of $G_X$ brings down one more falling factor: $G_X^{(k)}(1) = E\big(X(X-1)\cdots(X-k+1)\big)$ , the $k$ th factorial moment. The mean uses $k = 1$ ; the variance uses $k = 2$ (then converts $E(X(X-1))$ to $E(X^2)$ ). This is why $G''(1)$ , not $G''(0)$ , is the gateway to the variance.

Finding the Mean Using the PGF

$E(X) = G_X'(1).$

Proof. Differentiate the series term by term (legitimate inside the radius of convergence):

$G_X'(t) = \frac{d}{dt}\sum_{r=0}^{\infty} P(X = r)\,t^r = \sum_{r=1}^{\infty} r\, P(X = r)\, t^{r-1}$

(the $r = 0$ term vanishes on differentiating a constant). Setting $t = 1$ ,

$G_X'(1) = \sum_{r=1}^{\infty} r\, P(X = r) = \sum_{r=0}^{\infty} r\, P(X = r) = E(X).$

The factor $r$ brought down by differentiation is exactly the weight in the expectation — which is why the derivative computes the mean.

Finding the Variance Using the PGF

Differentiating a second time brings down a factor $r(r-1)$ , so

$G_X''(t) = \sum_{r=2}^{\infty} r(r-1)\, P(X = r)\, t^{r-2} \;\Rightarrow\; G_X''(1) = \sum_{r} r(r-1) P(X=r) = E\big(X(X-1)\big).$

Since $E(X(X-1)) = E(X^2) - E(X)$ , we recover the second moment and hence the variance:

$E(X^2) = G_X''(1) + G_X'(1),$

$\text{Var}(X) = G_X''(1) + G_X'(1) - (G_X'(1))^2$

Worked Example 1: Poisson PGF — mean and variance (with mark scheme)

For $X \sim \text{Po}(\lambda)$ , $G_X(t) = e^{\lambda(t-1)}$ . Differentiate (chain rule):

$G_X'(t) = \lambda e^{\lambda(t-1)} \;\Rightarrow\; G_X'(1) = \lambda e^{0} = \lambda \;\Rightarrow\; E(X) = \lambda. \quad (\text{M1 differentiate; A1 } E(X)=\lambda)$ $G_X''(t) = \lambda^2 e^{\lambda(t-1)} \;\Rightarrow\; G_X''(1) = \lambda^2 = E(X(X-1)). \quad (\text{M1 second derivative})$ $\operatorname{Var}(X) = G_X''(1) + G_X'(1) - [G_X'(1)]^2 = \lambda^2 + \lambda - \lambda^2 = \lambda. \quad (\text{A1})$

Confirming the Poisson signature mean = variance = $\lambda$ . (M1/A1 for the mean; M1 for $G''(1)$ ; A1 for the variance. Always evaluate the derivatives at $t = 1$ .)

Worked Example 2: Geometric PGF — derive and use (with mark scheme)

Let $X \sim \text{Geometric}(p)$ on $\{1, 2, 3, \ldots\}$ with $P(X = r) = q^{r-1}p$ , $q = 1 - p$ . Derive $G_X(t)$ and hence $E(X)$ .

$G_X(t) = \sum_{r=1}^{\infty} q^{r-1}p\, t^r = pt\sum_{r=1}^{\infty}(qt)^{r-1} = \frac{pt}{1 - qt}, \quad |t| < \tfrac1q. \quad (\text{M1 geometric series; A1})$

Differentiate by the quotient rule:

$G_X'(t) = \frac{p(1 - qt) - pt(-q)}{(1-qt)^2} = \frac{p}{(1-qt)^2}. \quad (\text{M1})$ $E(X) = G_X'(1) = \frac{p}{(1-q)^2} = \frac{p}{p^2} = \frac1p. \quad (\text{A1})$

So the mean of a geometric is $1/p$ — exactly the familiar result, now derived from the PGF. (M1 summing the geometric series; A1 for $G_X(t)$ ; M1 quotient-rule differentiation; A1 for $E(X) = 1/p$ . A second derivative gives $\operatorname{Var}(X) = q/p^2$ .)

Worked Example 2b: binomial mean and variance from the PGF

For $X \sim B(n, p)$ , $G_X(t) = (q + pt)^n$ . Differentiate using the chain rule:

$G_X'(t) = n(q + pt)^{n-1}\cdot p = np(q + pt)^{n-1} \;\Rightarrow\; G_X'(1) = np(q + p)^{n-1} = np,$

since $q + p = 1$ . So $E(X) = np$ , the familiar binomial mean. For the variance,

$G_X''(t) = n(n-1)p^2(q + pt)^{n-2} \;\Rightarrow\; G_X''(1) = n(n-1)p^2 = E\big(X(X-1)\big),$ $\operatorname{Var}(X) = G_X''(1) + G_X'(1) - [G_X'(1)]^2 = n(n-1)p^2 + np - n^2p^2.$

Expanding, $n^2p^2 - np^2 + np - n^2p^2 = np - np^2 = np(1 - p) = npq$ . Thus $\operatorname{Var}(X) = npq$ — the standard result, derived cleanly from one PGF.

PGF of a Sum of Independent Variables

If $X$ and $Y$ are independent, then $E(t^{X+Y}) = E(t^X t^Y) = E(t^X)E(t^Y)$ , giving the key multiplicative rule:

$G_{X+Y}(t) = G_X(t)\, G_Y(t).$

This rests on the fact that for independent variables the expectation of a product factorises: $E(t^X t^Y) = E(t^X)E(t^Y)$ precisely because independence lets the joint probabilities split. It extends to any number of independents: $G_{X_1 + \cdots + X_n}(t) = \prod_i G_{X_i}(t)$ .

The uniqueness theorem. Two non-negative-integer random variables with the same PGF have the same distribution. This is clear from the coefficient interpretation: if $G_X(t) = G_Y(t)$ as power series, then matching coefficients of $t^r$ forces $P(X = r) = P(Y = r)$ for every $r$ . Uniqueness is what licenses the final step of every proof below — once we recognise a product as a known PGF, we may conclude the sum has that distribution, not merely "a distribution with the same PGF." Together, the product rule and uniqueness form a proof machine for "sum of … is …" results.

Application 1 — sum of binomials. For independent $X \sim B(n_1, p)$ , $Y \sim B(n_2, p)$ (same $p$ ),

$G_{X+Y}(t) = (q + pt)^{n_1}(q + pt)^{n_2} = (q + pt)^{n_1 + n_2},$

the PGF of $B(n_1 + n_2, p)$ . By uniqueness $X + Y \sim B(n_1 + n_2, p)$ — obvious from "combine the trials," but the PGF proves it instantly.

Application 2 — sum of Poissons. For independent $X \sim \text{Po}(\lambda)$ , $Y \sim \text{Po}(\mu)$ ,

$G_{X+Y}(t) = e^{\lambda(t-1)}e^{\mu(t-1)} = e^{(\lambda + \mu)(t-1)},$

the PGF of $\text{Po}(\lambda + \mu)$ . Hence $X + Y \sim \text{Po}(\lambda + \mu)$ : a sum of independent Poissons is Poisson, with the rates adding. This is far slicker than the direct convolution $P(X + Y = n) = \sum_{k=0}^{n} P(X = k)P(Y = n - k)$ , which would require summing a product of two Poisson terms and recognising the binomial theorem inside — the PGF makes the whole calculation a one-line multiplication of exponentials.

Application 3 — sum of geometrics. The sum of $r$ independent $\text{Geometric}(p)$ variables (each on $\{1, 2, \ldots\}$ ) has PGF

$\left(\frac{pt}{1 - qt}\right)^{\!r},$

which is the PGF of the negative binomial distribution (the number of trials to achieve the $r$ th success). So "sum of $r$ geometrics = negative binomial" drops out instantly — the discrete analogue of "sum of $n$ exponentials = Gamma" from the continuous-distributions lesson.

PGF vs MGF

Feature	PGF $G_X(t)$	MGF $M_X(t)$
Definition	$E(t^X)$	$E(e^{tX})$
Applies to	Non-negative integer-valued $X$	Any $X$
Relationship	$M_X(t) = G_X(e^t)$	$G_X(t) = M_X(\ln t)$
Mean	$G_X'(1)$	$M_X'(0)$
Probabilities	Coefficients of $t^r$	Not directly available

Probability Generating Functions

Probability Generating Functions

Where this sits in AQA 7367

Definition

Recovering Probabilities

Standard PGFs

Finding the Mean Using the PGF

Finding the Variance Using the PGF

Worked Example 1: Poisson PGF — mean and variance (with mark scheme)

Worked Example 2: Geometric PGF — derive and use (with mark scheme)

Worked Example 2b: binomial mean and variance from the PGF

PGF of a Sum of Independent Variables

PGF vs MGF

More in Mathematics