You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Lesson 5 introduced E(X), Var(X) and the summary measures of a density. This lesson is the integration workshop that makes those formulae robust under exam pressure: polynomial densities that reduce to neat fractions, densities with a parameter that you must carry symbolically, expectations of functions E(g(X)), and the integration-by-parts and improper-integral techniques needed for the exponential distribution — the continuous waiting-time model that completes the Poisson story of Lessons 2–3. Throughout, the discipline is the same: every integral is shown in full, every result is verified, and the variance is built from E(X2)−[E(X)]2.
This is Paper 3 Statistics option (7367/3S) content (per-paper weighting AO1 40% / AO2 25% / AO3 35%). Computing ∫xfdx and ∫x2fdx is AO1; choosing a method (parts vs direct, or exploiting symmetry to avoid an integral) and interpreting the spread is AO2; multi-step problems — a parameterised density, or a real waiting-time context — are AO3. The prerequisites are A-Level Mathematics integration (including by parts and improper integrals) and the expectation/variance definitions from Lessons 1 and 5.
This lesson is where the continuous-distribution strand becomes genuinely computational. Lesson 4 established what a density is and Lesson 5 catalogued the summary measures; here you build the fluency to evaluate those measures reliably for any density an examiner can pose — polynomial, exponential, trigonometric, or carrying an unknown parameter. Because the option papers reward problem-solving so heavily (the 35% AO3 weighting is the highest of any paper in the qualification), the marks come not from knowing the formulae but from executing the integration cleanly under time pressure and interpreting the answer in context. Every worked example below therefore shows the full integration and a verification step, modelling exactly the discipline the examiners look for.
For a continuous random variable X with PDF f supported on [a,b] (use ±∞ for an unbounded support):
| Quantity | Formula |
|---|---|
| Mean | E(X)=∫abxf(x)dx |
| Second moment | E(X2)=∫abx2f(x)dx |
| Function | E(g(X))=∫abg(x)f(x)dx |
| Variance | Var(X)=E(X2)−(E(X))2 |
| Standard deviation | SD(X)=Var(X) |
The variance identity Var(X)=E(X2)−[E(X)]2 is the computational form; the definitional form Var(X)=E[(X−μ)2]=∫(x−μ)2fdx gives the same value but is usually messier to integrate. Always compute E(X) and E(X2) separately, then subtract.
The law of the unconscious statistician. To find E(g(X)) you do not need the distribution of g(X): just multiply g(x) by the density and integrate. This is why E(X2) is simply ∫x2fdx. The name is a gentle joke — students apply the rule "unconsciously", without realising the deeper result it relies on (that the expectation of a function can be computed from the original density). It is the single most-used identity in the whole topic: every variance calculation, every moment, every expected-cost or expected-payoff problem flows through it.
The integration techniques you will need. Most 3S densities are polynomials on a finite interval, so the integrals reduce to the power rule and a little arithmetic with fractions. But three other types recur. Improper integrals appear whenever the support is unbounded (the exponential on [0,∞)): write ∫0∞=limt→∞∫0t and argue the upper boundary term tends to zero. Integration by parts is needed when the integrand is a product such as xe−λx or xsinx; take the algebraic factor as u (so that du simplifies) and the transcendental factor as dv. Trigonometric densities on intervals like [0,π] may require a double-angle identity before integrating x2sinx. Recognising which technique a density calls for, before you start, saves time and avoids dead ends.
Linearity and scaling. Exactly as in the discrete case,
E(aX+b)=aE(X)+b,Var(aX+b)=a2Var(X),SD(aX+b)=∣a∣SD(X).
Expectation is linear; variance squares the scale factor and is blind to the shift b (translating the whole distribution cannot change its spread). These rules are worth dwelling on because exam questions exploit them constantly. A change of units — converting a length in centimetres to inches, a temperature in Celsius to Fahrenheit, or a cost in pounds to a marked-up selling price — is exactly a transformation Y=aX+b, and the rules let you write down E(Y) and Var(Y) from E(X) and Var(X) without re-integrating. The fact that the standard deviation scales by ∣a∣ (the modulus, not a itself) matters when a<0: a reflection such as Y=−2X still produces a positive standard deviation 2SD(X), because spread cannot be negative.
Why the variance identity works. Expanding the definitional form makes the computational identity transparent:
Var(X)=E[(X−μ)2]=E[X2−2μX+μ2]=E(X2)−2μE(X)+μ2=E(X2)−2μ2+μ2=E(X2)−μ2,
using linearity of expectation and E(X)=μ. This is why Var(X)=E(X2)−[E(X)]2 is always valid, discrete or continuous — it is an algebraic identity, not an approximation. It also explains why E(X2)≥[E(X)]2: the left side exceeds the right by exactly the (non-negative) variance, an instance of the more general Jensen inequality for the convex function x↦x2.
Choosing a method. Before reaching for an integral, pause for two shortcuts. First, symmetry: if f is symmetric about x=c, then E(X)=c immediately, with no integration. Second, a standard distribution: if you recognise the density as exponential, uniform or one of the named forms, quote its mean and variance directly (unless the command word "by integration" forbids it). Only when neither shortcut applies do you integrate xf and x2f from scratch. Picking the cheapest valid route is itself an AO2 skill the examiners reward.
Let f(x)=12x2(1−x) for 0≤x≤1.
Validity. ∫0112x2(1−x)dx=12[3x3−4x4]01=12(31−41)=12⋅121=1. ✓
Mean.
E(X)=∫0112x3(1−x)dx=12[4x4−5x5]01=12(41−51)=12⋅201=53=0.6.(M1 ∫xf; A1)
Second moment.
E(X2)=∫0112x4(1−x)dx=12[5x5−6x6]01=12(51−61)=12⋅301=52=0.4.(M1 A1)
Variance and SD.
Var(X)=0.4−0.62=0.4−0.36=0.04,SD(X)=0.04=0.2.(M1 E(X2)−[E(X)]2; A1)
(This is the Beta(3,2) distribution; the general result E=α+βα=53 confirms the mean.)
Let f(x)=(n+1)xn for 0≤x≤1, with n a non-negative integer. Carry n symbolically.
Validity. ∫01(n+1)xndx=(n+1)⋅n+11=1 for every n≥0. ✓
Mean.
E(X)=∫01(n+1)xn+1dx=(n+1)⋅n+21=n+2n+1.(M1 A1)
As n→∞, E(X)→1 — sensible, since the density piles up near x=1.
Second moment and variance.
E(X2)=(n+1)⋅n+31=n+3n+1,
Var(X)=n+3n+1−(n+2n+1)2=(n+2)2(n+3)(n+1)(n+2)2−(n+1)2(n+3)=(n+2)2(n+3)n+1.(M1 common denom; A1)
Check with n=2: this is f=3x2, giving Var=16⋅53=803=0.0375, which matches E(X2)−[E(X)]2=53−(43)2=0.6−0.5625=0.0375. ✓
Let f(x)=2(1−x) for 0≤x≤1.
E(X3)=∫01x3⋅2(1−x)dx=2[4x4−5x5]01=2(41−51)=2⋅201=101.(M1 A1)
Notice the power of the unconscious statistician at work: to find E(X3) we did not need the distribution of X3 — we simply multiplied x3 by the density and integrated. The same density f(x)=2(1−x) has E(X)=2∫01(x−x2)dx=2(21−31)=31 and E(X2)=2∫01(x2−x3)dx=2(31−41)=61, so Var(X)=61−91=183−2=181 — a complete summary obtained from three near-identical integrals.
Let X∼U(2,8), the continuous uniform distribution on [2,8], with constant density f(x)=8−21=61. Deriving the mean and variance by integration (rather than quoting the formulae) is instructive.
E(X)=∫28x⋅61dx=61[2x2]28=61⋅264−4=61⋅30=5,(M1 A1)
which matches the midpoint 22+8=5 expected from symmetry. For the variance,
E(X2)=∫28x2⋅61dx=61[3x3]28=61⋅3512−8=61⋅168=28,
Var(X)=28−52=28−25=3=12(8−2)2=1236,(M1 A1)
confirming the standard result Var(U(a,b))=12(b−a)2. The uniform distribution is the natural model for "rounding error" (a value rounded to the nearest unit has error ∼U(−0.5,0.5)) and for a quantity known only to lie in an interval with no further information — the maximum-entropy choice.
(Specimen-style — not from any real paper.)
The lifetime X (in years) of a component is modelled by f(x)=41e−x/4 for x≥0. (a) State the value of λ and write down E(X). (b) Find Var(X) by integration. (c) Find P(X>6).
(a) Comparing with λe−λx gives λ=41, so X∼Exp(41) and E(X)=λ1=4 years.
(b) E(X2)=∫0∞x2⋅41e−x/4dx. Using the standard result ∫0∞x2λe−λxdx=λ22 (derived in §10 by parts), E(X2)=(1/4)22=2⋅16=32. Hence
Var(X)=E(X2)−[E(X)]2=32−16=16=λ21,SD(X)=4.
(c) Using the exponential CDF F(x)=1−e−x/4: P(X>6)=e−6/4=e−1.5≈0.2231.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.