Expectation and Variance of Continuous Distributions

Lesson 5 introduced $E(X)$ , $\text{Var}(X)$ and the summary measures of a density. This lesson is the integration workshop that makes those formulae robust under exam pressure: polynomial densities that reduce to neat fractions, densities with a parameter that you must carry symbolically, expectations of functions $E(g(X))$ , and the integration-by-parts and improper-integral techniques needed for the exponential distribution — the continuous waiting-time model that completes the Poisson story of Lessons 2–3. Throughout, the discipline is the same: every integral is shown in full, every result is verified, and the variance is built from $E(X^2) - [E(X)]^2$ .

1. Where this sits in AQA 7367

This is Paper 3 Statistics option (7367/3S) content (per-paper weighting AO1 40% / AO2 25% / AO3 35%). Computing $\int x f\,dx$ and $\int x^2 f\,dx$ is AO1; choosing a method (parts vs direct, or exploiting symmetry to avoid an integral) and interpreting the spread is AO2; multi-step problems — a parameterised density, or a real waiting-time context — are AO3. The prerequisites are A-Level Mathematics integration (including by parts and improper integrals) and the expectation/variance definitions from Lessons 1 and 5.

This lesson is where the continuous-distribution strand becomes genuinely computational. Lesson 4 established what a density is and Lesson 5 catalogued the summary measures; here you build the fluency to evaluate those measures reliably for any density an examiner can pose — polynomial, exponential, trigonometric, or carrying an unknown parameter. Because the option papers reward problem-solving so heavily (the 35% AO3 weighting is the highest of any paper in the qualification), the marks come not from knowing the formulae but from executing the integration cleanly under time pressure and interpreting the answer in context. Every worked example below therefore shows the full integration and a verification step, modelling exactly the discipline the examiners look for.

2. Core theory

For a continuous random variable $X$ with PDF $f$ supported on $[a,b]$ (use $\pm\infty$ for an unbounded support):

Quantity	Formula
Mean	$E(X) = \displaystyle\int_a^b x\,f(x)\,dx$
Second moment	$E(X^2) = \displaystyle\int_a^b x^2 f(x)\,dx$
Function	$E\big(g(X)\big) = \displaystyle\int_a^b g(x)\,f(x)\,dx$
Variance	$\text{Var}(X) = E(X^2) - \big(E(X)\big)^2$
Standard deviation	$\text{SD}(X) = \sqrt{\text{Var}(X)}$

The variance identity $\text{Var}(X) = E(X^2) - [E(X)]^2$ is the computational form; the definitional form $\text{Var}(X) = E\big[(X-\mu)^2\big] = \int (x-\mu)^2 f\,dx$ gives the same value but is usually messier to integrate. Always compute $E(X)$ and $E(X^2)$ separately, then subtract.

The law of the unconscious statistician. To find $E(g(X))$ you do not need the distribution of $g(X)$ : just multiply $g(x)$ by the density and integrate. This is why $E(X^2)$ is simply $\int x^2 f\,dx$ . The name is a gentle joke — students apply the rule "unconsciously", without realising the deeper result it relies on (that the expectation of a function can be computed from the original density). It is the single most-used identity in the whole topic: every variance calculation, every moment, every expected-cost or expected-payoff problem flows through it.

The integration techniques you will need. Most 3S densities are polynomials on a finite interval, so the integrals reduce to the power rule and a little arithmetic with fractions. But three other types recur. Improper integrals appear whenever the support is unbounded (the exponential on $[0,\infty)$ ): write $\int_0^\infty = \lim_{t\to\infty}\int_0^t$ and argue the upper boundary term tends to zero. Integration by parts is needed when the integrand is a product such as $x\,e^{-\lambda x}$ or $x\sin x$ ; take the algebraic factor as $u$ (so that $du$ simplifies) and the transcendental factor as $dv$ . Trigonometric densities on intervals like $[0,\pi]$ may require a double-angle identity before integrating $x^2\sin x$ . Recognising which technique a density calls for, before you start, saves time and avoids dead ends.

Linearity and scaling. Exactly as in the discrete case,

$E(aX + b) = aE(X) + b, \qquad \text{Var}(aX + b) = a^2\,\text{Var}(X), \qquad \text{SD}(aX+b) = |a|\,\text{SD}(X).$

Expectation is linear; variance squares the scale factor and is blind to the shift $b$ (translating the whole distribution cannot change its spread). These rules are worth dwelling on because exam questions exploit them constantly. A change of units — converting a length in centimetres to inches, a temperature in Celsius to Fahrenheit, or a cost in pounds to a marked-up selling price — is exactly a transformation $Y = aX + b$ , and the rules let you write down $E(Y)$ and $\text{Var}(Y)$ from $E(X)$ and $\text{Var}(X)$ without re-integrating. The fact that the standard deviation scales by $|a|$ (the modulus, not $a$ itself) matters when $a < 0$ : a reflection such as $Y = -2X$ still produces a positive standard deviation $2\,\text{SD}(X)$ , because spread cannot be negative.

Why the variance identity works. Expanding the definitional form makes the computational identity transparent:

$\text{Var}(X) = E\big[(X - \mu)^2\big] = E\big[X^2 - 2\mu X + \mu^2\big] = E(X^2) - 2\mu E(X) + \mu^2 = E(X^2) - 2\mu^2 + \mu^2 = E(X^2) - \mu^2,$

using linearity of expectation and $E(X) = \mu$ . This is why $\text{Var}(X) = E(X^2) - [E(X)]^2$ is always valid, discrete or continuous — it is an algebraic identity, not an approximation. It also explains why $E(X^2) \ge [E(X)]^2$ : the left side exceeds the right by exactly the (non-negative) variance, an instance of the more general Jensen inequality for the convex function $x \mapsto x^2$ .

Choosing a method. Before reaching for an integral, pause for two shortcuts. First, symmetry: if $f$ is symmetric about $x = c$ , then $E(X) = c$ immediately, with no integration. Second, a standard distribution: if you recognise the density as exponential, uniform or one of the named forms, quote its mean and variance directly (unless the command word "by integration" forbids it). Only when neither shortcut applies do you integrate $x f$ and $x^2 f$ from scratch. Picking the cheapest valid route is itself an AO2 skill the examiners reward.

3. Worked examples with M1/A1 mark scheme

Example 1 — a polynomial density

Let $f(x) = 12x^2(1-x)$ for $0 \le x \le 1$ .

Validity. $\displaystyle\int_0^1 12x^2(1-x)\,dx = 12\!\left[\tfrac{x^3}{3} - \tfrac{x^4}{4}\right]_0^1 = 12\big(\tfrac13 - \tfrac14\big) = 12\cdot\tfrac{1}{12} = 1.$ ✓

Mean.

$E(X) = \int_0^1 12x^3(1-x)\,dx = 12\!\left[\tfrac{x^4}{4} - \tfrac{x^5}{5}\right]_0^1 = 12\big(\tfrac14 - \tfrac15\big) = 12\cdot\tfrac{1}{20} = \tfrac{3}{5} = 0.6. \quad (\textbf{M1}\ \int xf;\ \textbf{A1})$

Second moment.

$E(X^2) = \int_0^1 12x^4(1-x)\,dx = 12\!\left[\tfrac{x^5}{5} - \tfrac{x^6}{6}\right]_0^1 = 12\big(\tfrac15 - \tfrac16\big) = 12\cdot\tfrac{1}{30} = \tfrac{2}{5} = 0.4. \quad (\textbf{M1 A1})$

Variance and SD.

$\text{Var}(X) = 0.4 - 0.6^2 = 0.4 - 0.36 = 0.04, \qquad \text{SD}(X) = \sqrt{0.04} = 0.2. \quad (\textbf{M1}\ E(X^2)-[E(X)]^2;\ \textbf{A1})$

(This is the Beta(3,2) distribution; the general result $E = \tfrac{\alpha}{\alpha+\beta} = \tfrac{3}{5}$ confirms the mean.)

Example 2 — a density with a parameter

Let $f(x) = (n+1)x^n$ for $0 \le x \le 1$ , with $n$ a non-negative integer. Carry $n$ symbolically.

Validity. $\int_0^1 (n+1)x^n\,dx = (n+1)\cdot\tfrac{1}{n+1} = 1$ for every $n \ge 0$ . ✓

Mean.

$E(X) = \int_0^1 (n+1)x^{\,n+1}\,dx = (n+1)\cdot\frac{1}{n+2} = \frac{n+1}{n+2}. \quad (\textbf{M1 A1})$

As $n \to \infty$ , $E(X) \to 1$ — sensible, since the density piles up near $x = 1$ .

Second moment and variance.

$E(X^2) = (n+1)\cdot\frac{1}{n+3} = \frac{n+1}{n+3},$

$\text{Var}(X) = \frac{n+1}{n+3} - \left(\frac{n+1}{n+2}\right)^2 = \frac{(n+1)(n+2)^2 - (n+1)^2(n+3)}{(n+2)^2(n+3)} = \frac{n+1}{(n+2)^2(n+3)}. \quad (\textbf{M1}\ \text{common denom};\ \textbf{A1})$

Check with $n=2$ : this is $f = 3x^2$ , giving $\text{Var} = \tfrac{3}{16\cdot 5} = \tfrac{3}{80} = 0.0375$ , which matches $E(X^2)-[E(X)]^2 = \tfrac35 - (\tfrac34)^2 = 0.6 - 0.5625 = 0.0375$ . ✓

Example 3 — expectation of a function, $E(X^3)$

Let $f(x) = 2(1-x)$ for $0 \le x \le 1$ .

$E(X^3) = \int_0^1 x^3\cdot 2(1-x)\,dx = 2\!\left[\tfrac{x^4}{4} - \tfrac{x^5}{5}\right]_0^1 = 2\big(\tfrac14 - \tfrac15\big) = 2\cdot\tfrac{1}{20} = \frac{1}{10}. \quad (\textbf{M1 A1})$

Notice the power of the unconscious statistician at work: to find $E(X^3)$ we did not need the distribution of $X^3$ — we simply multiplied $x^3$ by the density and integrated. The same density $f(x) = 2(1-x)$ has $E(X) = 2\int_0^1(x - x^2)\,dx = 2(\tfrac12 - \tfrac13) = \tfrac13$ and $E(X^2) = 2\int_0^1(x^2 - x^3)\,dx = 2(\tfrac13 - \tfrac14) = \tfrac16$ , so $\text{Var}(X) = \tfrac16 - \tfrac19 = \tfrac{3 - 2}{18} = \tfrac{1}{18}$ — a complete summary obtained from three near-identical integrals.

Example 4 — a uniform distribution from first principles

Let $X \sim U(2, 8)$ , the continuous uniform distribution on $[2, 8]$ , with constant density $f(x) = \tfrac{1}{8 - 2} = \tfrac16$ . Deriving the mean and variance by integration (rather than quoting the formulae) is instructive.

$E(X) = \int_2^8 x\cdot\tfrac16\,dx = \tfrac16\!\left[\tfrac{x^2}{2}\right]_2^8 = \tfrac16\cdot\tfrac{64 - 4}{2} = \tfrac16\cdot 30 = 5, \quad (\textbf{M1 A1})$

which matches the midpoint $\tfrac{2 + 8}{2} = 5$ expected from symmetry. For the variance,

$E(X^2) = \int_2^8 x^2\cdot\tfrac16\,dx = \tfrac16\!\left[\tfrac{x^3}{3}\right]_2^8 = \tfrac16\cdot\tfrac{512 - 8}{3} = \tfrac16\cdot 168 = 28,$

$\text{Var}(X) = 28 - 5^2 = 28 - 25 = 3 = \frac{(8 - 2)^2}{12} = \frac{36}{12}, \quad (\textbf{M1 A1})$

confirming the standard result $\text{Var}\big(U(a,b)\big) = \tfrac{(b-a)^2}{12}$ . The uniform distribution is the natural model for "rounding error" (a value rounded to the nearest unit has error $\sim U(-0.5, 0.5)$ ) and for a quantity known only to lie in an interval with no further information — the maximum-entropy choice.

4. Specimen-style exam question

(Specimen-style — not from any real paper.)

The lifetime $X$ (in years) of a component is modelled by $f(x) = \tfrac{1}{4}e^{-x/4}$ for $x \ge 0$ . (a) State the value of $\lambda$ and write down $E(X)$ . (b) Find $\text{Var}(X)$ by integration. (c) Find $P(X > 6)$ .

(a) Comparing with $\lambda e^{-\lambda x}$ gives $\lambda = \tfrac14$ , so $X \sim \text{Exp}(\tfrac14)$ and $E(X) = \tfrac{1}{\lambda} = 4$ years.

(b) $\displaystyle E(X^2) = \int_0^\infty x^2\cdot\tfrac14 e^{-x/4}\,dx$ . Using the standard result $\int_0^\infty x^2 \lambda e^{-\lambda x}\,dx = \tfrac{2}{\lambda^2}$ (derived in §10 by parts), $E(X^2) = \tfrac{2}{(1/4)^2} = 2\cdot 16 = 32$ . Hence

$\text{Var}(X) = E(X^2) - [E(X)]^2 = 32 - 16 = 16 = \frac{1}{\lambda^2}, \qquad \text{SD}(X) = 4.$

(c) Using the exponential CDF $F(x) = 1 - e^{-x/4}$ : $P(X > 6) = e^{-6/4} = e^{-1.5} \approx 0.2231.$

5. Synoptic links

Lesson 5 (PDF summaries): the mean/variance integrals are the same; here the emphasis is the integration technique and parameterised/infinite-support densities.
Lessons 2–3 (Poisson): the exponential distribution is the continuous partner of the Poisson — if events occur as a Poisson process at rate $\lambda$ , the waiting time between them is $\text{Exp}(\lambda)$ . Mean wait $1/\lambda$ is the reciprocal of the event rate.
A-Level Maths: integration by parts and the improper integral $\int_0^\infty x^n e^{-x}\,dx$ are pure-syllabus tools; the factorial pattern $\int_0^\infty x^n e^{-x}\,dx = n!$ links to sequences.
Lesson 8 (MGFs): differentiating the exponential MGF $\tfrac{\lambda}{\lambda - t}$ reproduces $E(X) = 1/\lambda$ , $\text{Var}(X) = 1/\lambda^2$ without any integration — a powerful cross-check.

Expectation and Variance of Continuous Distributions

Expectation and Variance of Continuous Distributions

1. Where this sits in AQA 7367

2. Core theory

3. Worked examples with M1/A1 mark scheme

Example 1 — a polynomial density

Example 2 — a density with a parameter

Example 3 — expectation of a function, E(X3)E(X^3)E(X3)

Example 4 — a uniform distribution from first principles

4. Specimen-style exam question

5. Synoptic links

More in Mathematics

Example 3 — expectation of a function, $E(X^3)$