You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
The moment generating function (MGF) is the most elegant idea in the 7367/3S option. A single function MX(t) encodes every moment of a distribution at once: differentiate it and set t=0 to read off E(X), E(X2), E(X3), … in turn. Better still, two structural theorems — uniqueness (the MGF determines the distribution) and multiplicativity (the MGF of an independent sum is the product of MGFs) — turn otherwise hard "prove the sum is also Poisson/normal" questions into a few clean lines. This lesson derives the standard MGFs from scratch, with full mark schemes, and shows all three exam uses.
This is Paper 3 Statistics option (7367/3S) content (per-paper weighting AO1 40% / AO2 25% / AO3 35%). Deriving an MGF and differentiating it for moments is AO1; the uniqueness and product theorems are reasoning/proof tools (AO2 — proving a sum is Poisson is an AO2 "show that"); applying MGFs to an unfamiliar combination of variables is AO3. The prerequisites are the discrete and continuous expectation definitions (Lessons 1, 5, 7), the Maclaurin series for ex (A-Level Maths / Further Pure), the product and chain rules, and the geometric/exponential series.
The moment generating function is the conceptual high point of the statistics option, and it is examined precisely because it ties so many earlier skills together: series expansions, differentiation, the named distributions and their parameters, and the algebra of independent sums. A student who is fluent with MGFs can find a mean and variance in two derivatives, prove a distributional result in three lines, and identify an unknown distribution by pattern-matching — feats that would otherwise demand laborious summation or integration. Because the technique is so powerful, examiners reward not just the mechanics but the understanding: knowing why MX(0)=1, why independent sums multiply, and why the uniqueness theorem licenses an identification. This lesson builds all three from first principles.
The moment generating function of a random variable X is
MX(t)=E(etX),
a function of the auxiliary real variable t. Concretely,
discrete: MX(t)=∑xetxP(X=x),continuous: MX(t)=∫−∞∞etxf(x)dx.
The MGF exists (is finite) on some open interval of t-values containing 0; note that MX(0)=E(e0)=E(1)=1 always — a useful instant check. If a derived MGF does not give 1 at t=0, you have made an algebra error: this single substitution catches a surprising number of slips before they propagate into the moments. The auxiliary variable t has no probabilistic meaning of its own — it is purely a book-keeping device whose powers tag the successive moments. Think of MX(t) as a "moment dispenser": feed in derivatives and the value t=0, and out come E(X), E(X2), E(X3), and so on in order.
Expand etX as a power series in t:
etX=1+tX+2!(tX)2+3!(tX)3+⋯=∑n=0∞n!tnXn.
Take expectations term by term:
MX(t)=1+tE(X)+2!t2E(X2)+3!t3E(X3)+⋯=∑n=0∞n!tnE(Xn).
The coefficient of n!tn is the n-th moment E(Xn). Equivalently, differentiating n times and setting t=0 peels off that coefficient:
E(Xn)=MX(n)(0)
| Derivative at t=0 | Gives | Meaning |
|---|---|---|
| MX′(0) | E(X) | mean |
| MX′′(0) | E(X2) | second moment |
| MX′′′(0) | E(X3) | third moment |
and hence
Var(X)=MX′′(0)−(MX′(0))2.
The uniqueness theorem deserves emphasis because it is what makes the MGF a genuine identifier rather than a mere moment-calculator. Two distributions can share a mean and variance yet be utterly different (a uniform and a normal can both have mean 0 and variance 1, for instance), so matching the first two moments proves nothing. But matching the entire MGF — equivalently, all the moments simultaneously — pins the distribution down completely. That is why an exam question can hand you an unfamiliar MGF such as (1−2t)−3 and ask "what is the distribution?": you compare it against the known forms, and uniqueness guarantees that a match is conclusive. The product rule for independent sums and the uniqueness theorem work as a team: the product gives you the MGF of a sum, and uniqueness lets you name the resulting distribution. This pairing is the single most powerful technique in the whole option, turning otherwise formidable "prove the sum is also …" questions into a short algebraic recognition.
Let X∼Po(λ). Then
MX(t)=∑r=0∞etrr!e−λλr=e−λ∑r=0∞r!(λet)r=e−λeλet=eλ(et−1).(M1 factor e−λ; M1 recognise ex series; A1)
Mean (product/chain rule):
MX′(t)=λeteλ(et−1)⇒MX′(0)=λ⋅1⋅e0=λ=E(X).(M1 A1)
Second moment (differentiate the product again):
MX′′(t)=(λet+λ2e2t)eλ(et−1)⇒MX′′(0)=λ+λ2=E(X2).(M1 A1)
Variance: Var(X)=(λ+λ2)−λ2=λ (mean = variance, as it must be for a Poisson). (A1)
Let X∼Exp(λ), f(x)=λe−λx for x≥0. For t<λ,
MX(t)=∫0∞etxλe−λxdx=λ∫0∞e−(λ−t)xdx=λ[λ−t−e−(λ−t)x]0∞=λ−tλ.(M1 combine exponents; A1; B1 state t<λ)
The condition t<λ is essential: only then does e−(λ−t)x→0 at infinity. Differentiating,
MX′(t)=(λ−t)2λ⇒MX′(0)=λ1,MX′′(t)=(λ−t)32λ⇒MX′′(0)=λ22,
so E(X)=λ1, Var(X)=λ22−λ21=λ21. (M1 A1 A1) — matching Lesson 7 exactly, with no integration of x2f.
Let X∼Po(λ1), Y∼Po(λ2) be independent. Then
MX+Y(t)=MX(t)MY(t)=eλ1(et−1)eλ2(et−1)=e(λ1+λ2)(et−1).(M1 product; A1)
This is exactly the MGF of Po(λ1+λ2); by uniqueness, X+Y∼Po(λ1+λ2). (A1 conclude with named theorem)
Let X∼B(n,p), so P(X=r)=(rn)pr(1−p)n−r. Then
MX(t)=∑r=0netr(rn)pr(1−p)n−r=∑r=0n(rn)(pet)r(1−p)n−r=(pet+1−p)n,(M1 group as (pet)r; M1 binomial theorem; A1)
recognising the sum as the binomial expansion of (pet+(1−p))n. Differentiating once,
MX′(t)=n(1−p+pet)n−1⋅pet⇒MX′(0)=n⋅1n−1⋅p=np=E(X),(M1 A1)
the familiar binomial mean — derived in two lines rather than by the ∑rP(X=r) calculation. This example is a template for every "derive the MGF and hence the mean" question: identify the probability function, factor the etr into the existing structure, recognise a standard series (here the binomial theorem; for the Poisson, the exponential series), then differentiate.
(Specimen-style — not from any real paper.)
A discrete random variable X has moment generating function MX(t)=(41+43et)2. (a) Find MX′(t) and hence E(X). (b) Find Var(X). (c) Identify the distribution of X, giving its parameters.
(a) It is cleanest to expand first: MX(t)=(41+43et)2=161+166et+169e2t=161+83et+169e2t. Then
MX′(t)=83et+89e2t,MX′(0)=83+89=812=23.
Hence E(X)=23.
(b) Differentiating again, MX′′(t)=83et+49e2t, so MX′′(0)=83+49=83+18=821. Thus E(X2)=821 and
Var(X)=821−(23)2=821−818=83.
(c) The MGF has the binomial form (1−p+pet)n with n=2 and p=43 (since 1−p=41), so X∼B(2,43). Check: E(X)=np=2⋅43=23 ✓ and Var(X)=np(1−p)=2⋅43⋅41=83 ✓.
The deeper message is that the MGF is a unifying object. Each distribution you have met separately — Bernoulli, binomial, Poisson, exponential, normal — has a characteristic MGF, and the structural operations on random variables correspond to clean operations on those MGFs: adding independent variables multiplies their MGFs, scaling-and-shifting transforms MX(t) into ebtMX(at), and standardising a normal collapses its MGF to the universal et2/2. Seeing these correspondences turns a collection of distribution-specific facts into a single calculus of distributions, which is exactly the kind of synoptic mastery the Further Maths qualification is designed to develop and the option paper to test.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.