Discrete Random Variables (Further)

This lesson extends your understanding of discrete random variables beyond what is covered in A-Level Mathematics. You will learn to compute expectations of functions of random variables, derive variance rigorously via the $E(X^2)$ method, manipulate the algebra of expectation and variance, and combine independent variables. These skills are the load-bearing foundation for the entire Further Statistics module: the Poisson, the continuous distributions, moment generating functions and the chi-squared tests all rest on a fluent command of $E$ and $\text{Var}$ .

1. Where this sits in AQA 7367

This topic belongs to the Paper 3 Statistics option (7367/3S). Paper 3 carries the more problem-solving-weighted assessment profile (AO1 40% / AO2 25% / AO3 35%), so although the mechanics of computing $E(X)$ and $\text{Var}(X)$ are AO1, examiners deliberately wrap them in unfamiliar contexts and multi-step parameter-finding (AO3) and ask you to justify or interpret results (AO2). The A-Level Mathematics prerequisite is the discrete-distribution work in the applied paper (computing $E(X)$ and $\text{Var}(X)$ from a table); Further Maths adds functions of a random variable, the linear-combination algebra, and sums of independent variables.

Students choose two of Mechanics (3M), Statistics (3S) and Discrete (3D). If you are reading this you are taking Statistics — this lesson is the gateway to the rest of 3S.

2. Core theory: expectation and variance

A discrete random variable $X$ takes a countable set of values $x_1, x_2, \ldots$ , each with probability $P(X = x_i) = p_i$ . For the distribution to be valid:

Property	Requirement
Non-negativity	$p_i \geq 0$ for all $i$
Normalisation	$\displaystyle\sum_i p_i = 1$

We use this running example throughout the lesson:

$x$	1	2	3	4
$P(X = x)$	0.1	0.3	0.4	0.2

Check: $0.1 + 0.3 + 0.4 + 0.2 = 1$ . Valid.

Expectation

The expected value (mean) is the probability-weighted average of the values:

$E(X) = \mu = \sum_x x \, P(X = x).$

It is the long-run mean of $X$ over many repetitions. For the running example,

$E(X) = 1(0.1) + 2(0.3) + 3(0.4) + 4(0.2) = 0.1 + 0.6 + 1.2 + 0.8 = 2.7.$

$E(X)$ need not be an attainable value of $X$ . It is a balance point, not a mode or median.

Expectation of a function of $X$

For any function $g$ ,

$E\big(g(X)\big) = \sum_x g(x) \, P(X = x).$

This is the law of the unconscious statistician: you never need the distribution of $g(X)$ — apply $g$ to each value and weight by the original probabilities. The most important case is $g(x) = x^2$ :

$E(X^2) = 1^2(0.1) + 2^2(0.3) + 3^2(0.4) + 4^2(0.2) = 0.1 + 1.2 + 3.6 + 3.2 = 8.1.$

Variance

The variance measures spread. By definition it is the expected squared deviation from the mean,

$\text{Var}(X) = E\big((X-\mu)^2\big) = \sum_x (x-\mu)^2 P(X=x),$

but the computational form is far quicker. Expanding the square and using linearity:

\begin{aligned} \text{Var}(X) &= E\big(X^2 - 2\mu X + \mu^2\big) \\ &= E(X^2) - 2\mu E(X) + \mu^2 \\ &= E(X^2) - 2\mu^2 + \mu^2 \\ &= E(X^2) - \mu^2 = E(X^2) - \big(E(X)\big)^2. \end{aligned}

For the running example,

$\text{Var}(X) = 8.1 - 2.7^2 = 8.1 - 7.29 = 0.81, \qquad \sigma = \sqrt{0.81} = 0.9.$

Because variance is an expected square, $\text{Var}(X) \geq 0$ always, and the identity forces $E(X^2) \geq \big(E(X)\big)^2$ — with equality only when $X$ is constant.

The algebra of $E$ and $\text{Var}$

Expectation is linear; variance is not (it is quadratic in the scaling constant and blind to shifts):

Operation	Expectation	Variance
Scale by $a$	$E(aX) = aE(X)$	$\text{Var}(aX) = a^2\text{Var}(X)$
Shift by $b$	$E(X+b) = E(X)+b$	$\text{Var}(X+b) = \text{Var}(X)$
Linear map	$E(aX+b) = aE(X)+b$	$\text{Var}(aX+b) = a^2\text{Var}(X)$

A shift of $b$ slides the whole distribution along the axis without changing its shape, so the spread — and hence the variance — is untouched. A scaling by $a$ stretches deviations by $a$ , so squared deviations stretch by $a^2$ .

We can confirm linearity numerically. Computing $E(3X+2)$ directly:

$E(3X+2) = 5(0.1) + 8(0.3) + 11(0.4) + 14(0.2) = 0.5 + 2.4 + 4.4 + 2.8 = 10.1,$

which matches $3E(X) + 2 = 3(2.7) + 2 = 10.1$ . And $\text{Var}(3X+2) = 3^2(0.81) = 7.29$ .

Sums and combinations of independent variables

A great deal of Further Statistics — and almost every multi-source modelling problem — rests on how $E$ and $\text{Var}$ behave when variables are combined. Expectation is always additive, whatever the dependence between the variables:

$E(X + Y) = E(X) + E(Y), \qquad E(aX + bY) = aE(X) + bE(Y).$

This holds because expectation is a sum (an integral in the continuous case), and sums distribute over addition regardless of any relationship between $X$ and $Y$ . Variance is more delicate. For independent $X$ and $Y$ ,

$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y), \qquad \text{Var}(X - Y) = \text{Var}(X) + \text{Var}(Y).$

Note the second identity carefully: even for a difference, the variances add. Intuitively, subtracting an uncertain quantity injects just as much uncertainty as adding it — the spread of $X - Y$ is no smaller than the spread of $X + Y$ . More generally, for independent variables and constants $a, b$ ,

$\text{Var}(aX + bY) = a^2\text{Var}(X) + b^2\text{Var}(Y),$

where each coefficient is squared, exactly as in the single-variable rule $\text{Var}(aX) = a^2\text{Var}(X)$ . A frequent source of confusion deserves emphasis: the doubled variable $2X$ and the sum $X_1 + X_2$ of two independent copies of $X$ are not the same. The first scales a single outcome,

$\text{Var}(2X) = 2^2\,\text{Var}(X) = 4\,\text{Var}(X),$

while the second adds two independent outcomes,

$\text{Var}(X_1 + X_2) = \text{Var}(X) + \text{Var}(X) = 2\,\text{Var}(X).$

The means agree ( $E(2X) = E(X_1 + X_2) = 2E(X)$ ), but the variances differ by a factor of two — averaging independent measurements reduces relative spread, which is precisely why repeating an experiment and taking a mean improves precision.

Worked illustration. Suppose $X$ (the running example) and an independent $Y$ have $E(X) = 2.7,\ \text{Var}(X) = 0.81$ and $E(Y) = 4,\ \text{Var}(Y) = 2$ . Then $E(X + Y) = 6.7$ and, by independence, $\text{Var}(X + Y) = 0.81 + 2 = 2.81$ . For $T = 2X - 3Y$ : $E(T) = 2(2.7) - 3(4) = -6.6$ and $\text{Var}(T) = 2^2(0.81) + 3^2(2) = 3.24 + 18 = 21.24$ — note the minus sign in $T$ has no effect on the variance, which uses $(-3)^2 = 9$ .

These combination rules reappear throughout 3S: the additivity of the Poisson (Lesson 2), the variance of a sample mean, and the chained approximations of Lesson 3 are all consequences of them.

When variables are not independent: covariance

The independence assumption is doing real work in the variance rules, and it is worth seeing what happens without it. For any two variables,

$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\,\text{Cov}(X, Y), \qquad \text{Cov}(X, Y) = E(XY) - E(X)E(Y).$

The covariance measures how the two variables move together: it is positive when high $X$ tends to accompany high $Y$ , negative when high $X$ accompanies low $Y$ , and zero when there is no linear association. Independence forces $E(XY) = E(X)E(Y)$ , hence $\text{Cov}(X, Y) = 0$ , which is exactly why the cross-term vanishes and the variances simply add for independent variables.

A concrete computation makes this tangible. Suppose $X$ and $Y$ each take the values $0$ and $1$ , with the joint distribution

	$Y = 0$	$Y = 1$
$X = 0$	$0.1$	$0.3$
$X = 1$	$0.4$	$0.2$

The marginal for $X$ is $P(X=0) = 0.4,\ P(X=1) = 0.6$ , so $E(X) = 0.6$ ; the marginal for $Y$ is $P(Y=0) = 0.5,\ P(Y=1) = 0.5$ , so $E(Y) = 0.5$ . The only term contributing to $E(XY)$ is the cell where both equal 1: $E(XY) = 1\cdot 1\cdot 0.2 = 0.2$ . Therefore

$\text{Cov}(X, Y) = E(XY) - E(X)E(Y) = 0.2 - (0.6)(0.5) = 0.2 - 0.3 = -0.1.$

The negative covariance says that, in this joint distribution, $X = 1$ tends to coincide with $Y = 0$ — and indeed the largest probability, $0.4$ , sits in exactly that cell. Because the covariance is non-zero, $X$ and $Y$ are not independent, and you could not add their variances directly: you would need the $2\,\text{Cov}(X,Y) = -0.2$ correction. (A cleaner check of independence: under independence the top-left cell would be $P(X=0)P(Y=0) = 0.4\times 0.5 = 0.2$ , but the table gives $0.1$ — they differ, confirming dependence.) Although full two-variable distributions are at the edge of the A-Level Further specification, recognising when the additive variance rule applies — and being able to articulate that it requires independence — is squarely examinable and is precisely the reasoning the variance algebra is built on.

Mode, median and the discrete CDF

Two further summaries round out the description of a discrete distribution. The mode is the value of $x$ carrying the largest probability; a distribution is bimodal (or multimodal) when two or more values tie for the maximum. The median is a value $m$ with $P(X \leq m) \geq 0.5$ and $P(X \geq m) \geq 0.5$ — the "middle" value once the probability is accumulated. For the parameter example (with $k = 0.1$ , values $0,1,2,3$ carrying $0.1, 0.2, 0.3, 0.4$ ), the mode is $x = 3$ (probability $0.4$ ). Accumulating probability gives $P(X \leq 0) = 0.1,\ P(X \leq 1) = 0.3,\ P(X \leq 2) = 0.6$ , so the median is $m = 2$ , the first value at which the running total reaches or passes $0.5$ .

That running total is the cumulative distribution function (CDF),

$F(x) = P(X \leq x) = \sum_{x_i \leq x} P(X = x_i),$

a step function that climbs from 0 to 1, jumping at each value of $X$ by the size of its probability. The CDF and the probability mass function carry the same information: you recover an individual probability as the size of the jump,

$P(X = x) = F(x) - F(x^-),$

where $F(x^-)$ is the value of $F$ just to the left of $x$ . For the example, $F$ jumps by $0.1, 0.2, 0.3, 0.4$ at $x = 0, 1, 2, 3$ respectively, reaching exactly $1$ at $x = 3$ . The CDF is the natural tool for "at most" and "more than" questions — $P(X > 2) = 1 - F(2) = 1 - 0.6 = 0.4$ — and it is the discrete shadow of the continuous CDF that dominates the second half of this course.

3. Worked examples with M1/A1 mark schemes

Example 1 — full table analysis

A random variable $X$ has the distribution below, where $k$ is constant.

$x$	0	1	2	3
$P(X = x)$	$k$	$2k$	$3k$	$4k$

Find $k$ , $E(X)$ , $\text{Var}(X)$ and $E(2X^2 - 3X + 1)$ .

$k + 2k + 3k + 4k = 1 \implies 10k = 1 \implies k = 0.1.$ (M1 set sum of probabilities $=1$ ; A1 $k = 0.1$ .)

$E(X) = 0(0.1) + 1(0.2) + 2(0.3) + 3(0.4) = 2.0.$ (M1 $\sum x\,P(X=x)$ ; A1 $E(X)=2.0$ .)

$E(X^2) = 0(0.1) + 1(0.2) + 4(0.3) + 9(0.4) = 0 + 0.2 + 1.2 + 3.6 = 5.0.$ (M1 $\sum x^2 P(X=x)$ ; A1 $E(X^2)=5.0$ .)

$\text{Var}(X) = 5.0 - 2.0^2 = 1.0.$ (M1 apply $E(X^2)-(E(X))^2$ ; A1 $\text{Var}(X)=1.0$ .)

$E(2X^2 - 3X + 1) = 2E(X^2) - 3E(X) + 1 = 2(5.0) - 3(2.0) + 1 = 5.$ (M1 expand using linearity; A1 $=5$ .)

Example 2 — the variance algebra under transformation

The number of faults $X$ on a metre of cable has $E(X) = 1.5$ and $\text{Var}(X) = 0.75$ . The inspection cost in pounds is $C = 4X + 10$ . Find $E(C)$ , $\text{Var}(C)$ and the standard deviation of $C$ .

$E(C) = 4E(X) + 10 = 4(1.5) + 10 = 16.$ (M1 $E(aX+b)=aE(X)+b$ ; A1 $\pounds 16$ .)

$\text{Var}(C) = 4^2 \, \text{Var}(X) = 16(0.75) = 12.$ (M1 $\text{Var}(aX+b)=a^2\text{Var}(X)$ , with the $+10$ ignored; A1 $12$ .)

$\text{SD}(C) = \sqrt{12} = 2\sqrt{3} \approx 3.46.$ (A1 $\approx \pounds 3.46$ .)

Example 3 — recovering a distribution from its moments

A variable $X$ takes the values $0, 1, 2$ with $P(X=1) = 0.5$ . Given $E(X) = 1$ , find the full distribution and $\text{Var}(X)$ .

Let $P(X=0) = a$ and $P(X=2) = c$ . Then

$a + 0.5 + c = 1 \implies a + c = 0.5.$ (M1 normalisation equation.)

$E(X) = 0\cdot a + 1(0.5) + 2c = 1 \implies 0.5 + 2c = 1 \implies c = 0.25.$ (M1 expectation equation; A1 $c = 0.25$ , hence $a = 0.25$ .)

So $P(X=0) = 0.25,\ P(X=1) = 0.5,\ P(X=2) = 0.25$ . Then

$E(X^2) = 0 + 1(0.5) + 4(0.25) = 1.5, \qquad \text{Var}(X) = 1.5 - 1^2 = 0.5.$ (M1 $E(X^2)$ ; A1 $\text{Var}(X) = 0.5$ .)

4. Specimen-style exam question

(Specimen-style — not from any past paper.) The discrete random variable $X$ has probability distribution

$x$ $-1$ 0 1 2
$P(X=x)$ $0.2$ $a$ $0.3$ $b$

Given that $E(X) = 0.6$ : (a) show that $a + b = 0.5$ and find a second equation in $a$ and $b$ ; hence find $a$ and $b$ ; (b) find $\text{Var}(X)$ ; (c) find $\text{Var}(5 - 2X)$ .

$x$	$-1$	0	1	2
$P(X=x)$	$0.2$	$a$	$0.3$	$b$

Model solution.

(a) Probabilities sum to 1: $0.2 + a + 0.3 + b = 1 \Rightarrow a + b = 0.5$ . Expectation:

$E(X) = (-1)(0.2) + 0\cdot a + 1(0.3) + 2b = 0.1 + 2b = 0.6 \implies b = 0.25,$

so $a = 0.25$ . Both lie in $[0,1]$ , so the distribution is valid.

(b) $E(X^2) = (-1)^2(0.2) + 0 + 1^2(0.3) + 2^2(0.25) = 0.2 + 0.3 + 1.0 = 1.5$ , so

$\text{Var}(X) = 1.5 - 0.6^2 = 1.5 - 0.36 = 1.14.$

5. Synoptic links

Poisson and binomial (this module): both are discrete random variables; everything here — $E$ , $\text{Var}$ , the $E(X^2)$ identity — specialises to them. For $\text{Po}(\lambda)$ the headline fact is $E(X) = \text{Var}(X) = \lambda$ .
Continuous random variables (later lessons): every formula has a continuous analogue with $\sum$ replaced by $\int$ : $E(X) = \int x f(x)\,dx$ , $\text{Var}(X) = E(X^2) - (E(X))^2$ unchanged.
Moment generating functions (later): $M_X(t) = E(e^{tX})$ packages every moment; $E(X) = M_X'(0)$ and $E(X^2) = M_X''(0)$ regenerate the quantities above.
A-Level Maths — algebra of series and binomial expansion: the $\sum$ manipulations and $\sum p_i = 1$ reasoning mirror summing a probability series.
Linear coding (A-Level statistics): the coding $Y = \frac{X - a}{b}$ used to simplify mean/variance calculations is exactly $E(aX+b)$ / $\text{Var}(aX+b)$ in disguise.

6. Mark-scheme literacy

"Show that": the target value is printed, so you are paid for the method and communication, not the number. Display the full sum $\sum x P(X=x)$ and the substitution — landing on the given value with no working scores little.
"Hence": you must use the previous part. If part (a) gave $E(X^2)$ , a "hence find $\text{Var}(X)$ " expects $E(X^2) - (E(X))^2$ , not a fresh calculation.
Follow-through (ft): if you find a wrong $E(X)$ but then apply $\text{Var}(X) = E(X^2) - (E(X))^2$ correctly, the variance A-mark is usually awarded "ft" on your value. So always finish the method, even after a slip.
Exact vs decimal: give exact fractions or full decimals; a variance such as $0.81$ should not be rounded to $0.8$ unless asked. Premature rounding of $E(X)$ before squaring is a classic accuracy-mark loss.
Units and interpretation: AO2 marks attach to a sentence — "the expected cost is £16" — not a bare number.

7. Grade-band model answers

Question. The random variable $X$ has $E(X) = 4$ and $\text{Var}(X) = 9$ . Find $E(X^2)$ and $\text{Var}(2X - 1)$ , and explain why $\text{Var}(2X-1) \neq 2\text{Var}(X)$ .

Mid-band response. $E(X^2) = 9 + 16 = 25$ . $\text{Var}(2X-1) = 4 \times 9 = 36$ . It is not $2 \times 9$ because you square the 2.

Examiner-style commentary: The two numerical answers are correct and would earn the method and accuracy marks. The explanation is too thin for the AO2 mark — "you square the 2" states the rule rather than the reason, and the $-1$ is not addressed.

Stronger response. Rearranging $\text{Var}(X) = E(X^2) - (E(X))^2$ gives $E(X^2) = \text{Var}(X) + (E(X))^2 = 9 + 16 = 25$ . For the transformation, $\text{Var}(2X - 1) = 2^2\,\text{Var}(X) = 4(9) = 36$ ; the $-1$ is a shift and does not affect spread. It is not $2\text{Var}(X)$ because variance scales with the square of the multiplier.

Examiner-style commentary: Fully correct with the identity quoted and rearranged, and the shift correctly dismissed. The final sentence earns the reasoning mark. To reach the top band the explanation could connect the $a^2$ factor to squared deviations.

Top-band response. From $\text{Var}(X) = E(X^2) - (E(X))^2$ , $E(X^2) = 9 + 4^2 = 25$ . Writing $Y = 2X - 1$ , $\text{Var}(Y) = E((Y - E(Y))^2)$ ; since $Y - E(Y) = 2(X - E(X))$ , the deviation is scaled by 2 and the squared deviation by $2^2 = 4$ , giving $\text{Var}(Y) = 4\,\text{Var}(X) = 36$ . The additive constant $-1$ cancels in $Y - E(Y)$ , so it cannot change the variance. Hence $\text{Var}(2X-1) = 36 \neq 18 = 2\text{Var}(X)$ : variance is quadratic in the scale factor, not linear.

Examiner-style commentary: This is exemplary. The candidate derives the $a^2$ factor from first principles via $Y - E(Y) = 2(X - E(X))$ , explains why the shift cancels, and states the general principle. Every available AO1 and AO2 mark is secured.

8. Common misconceptions

" $E(X^2) = (E(X))^2$ ." False except for a constant. The gap between them is the variance: $E(X^2) - (E(X))^2 = \text{Var}(X) \geq 0$ .
" $\text{Var}(aX+b) = a\,\text{Var}(X) + b$ ." No — variance squares the scale and ignores the shift: $\text{Var}(aX+b) = a^2\text{Var}(X)$ .
" $\text{Var}(2X) = \text{Var}(X+X)$ ." These are different. $2X$ is one variable doubled, giving $4\text{Var}(X)$ ; $X_1 + X_2$ of two independent copies gives $2\text{Var}(X)$ .
"Expectation must be a possible value." $E(X) = 2.7$ is fine even though $X$ only takes integer values.
" $E(1/X) = 1/E(X)$ " or " $E(\sqrt{X}) = \sqrt{E(X)}$ ." Only linear functions pass through $E$ . For non-linear $g$ , compute $E(g(X)) = \sum g(x)P(X=x)$ .
"Variance can be negative." Impossible — it is an expected square. A negative answer signals an arithmetic slip (often $\big(E(X)\big)^2$ computed as $E(X^2)$ by mistake).
" $\text{Var}(X+Y) = \text{Var}(X)+\text{Var}(Y)$ always." Only when $X, Y$ are independent; otherwise add $2\,\text{Cov}(X,Y)$ .

9. Common errors

Forgetting to square inside $E(X^2)$ : computing $\sum xP$ again instead of $\sum x^2 P$ .
Rounding $E(X)$ before squaring it in $\text{Var}(X) = E(X^2) - (E(X))^2$ , introducing rounding error into the variance.
Dropping a term when a value is $0$ or negative — e.g. omitting the $x=0$ row (it contributes 0 to $E(X)$ but its probability still matters for normalisation) or mishandling $(-1)^2 = 1$ .
Keeping the additive constant in a variance: writing $\text{Var}(3X+2) = 9\text{Var}(X) + 2$ .
Sign slip in $\text{Var}(a - bX)$ : the answer uses $b^2$ , so the minus sign disappears.
Probabilities not summing to 1 because an unknown $k$ was found but never substituted back to check.

10. Going further (STEP / MAT / Oxbridge)

The identity $\text{Var}(X) = E(X^2) - (E(X))^2$ is the second instance of a deeper pattern. Define the moments $\mu_k = E(X^k)$ and the central moments $\nu_k = E\big((X-\mu)^k\big)$ . Then $\nu_2$ is the variance, $\nu_3$ governs skewness and $\nu_4$ governs kurtosis, and each $\nu_k$ expands into the raw moments by the binomial theorem:

$\nu_k = E\big((X-\mu)^k\big) = \sum_{j=0}^{k} \binom{k}{j} (-\mu)^{k-j} \mu_j.$

Setting $k=2$ recovers exactly $\nu_2 = \mu_2 - \mu^2$ . A short STEP-flavoured challenge: show that the central third moment is

$E\big((X-\mu)^3\big) = E(X^3) - 3\mu E(X^2) + 2\mu^3,$

and verify it on the running example (where $\mu = 2.7$ ). A second, genuinely useful result is the variance of a sum: for any $X, Y$ ,

$\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y) + 2\,\text{Cov}(X,Y), \quad \text{Cov}(X,Y) = E(XY) - E(X)E(Y),$

which collapses to additive variances precisely when $\text{Cov}(X,Y) = 0$ — guaranteed by independence, since then $E(XY) = E(X)E(Y)$ . The converse is false: uncorrelated does not imply independent, a favourite Oxbridge interview trap.

11. Additional A* practice (with worked answers)

P1. $X$ has $P(X=x) = \dfrac{x}{15}$ for $x = 1,2,3,4,5$ . Find $E(X)$ and $\text{Var}(X)$ .

$E(X) = \frac{1}{15}(1+4+9+16+25) = \frac{55}{15} = \frac{11}{3} \approx 3.667.$ $E(X^2) = \frac{1}{15}(1+8+27+64+125) = \frac{225}{15} = 15.$ $\text{Var}(X) = 15 - \left(\frac{11}{3}\right)^2 = 15 - \frac{121}{9} = \frac{14}{9} \approx 1.556.$

P2. Given $E(X) = 3$ and $E(X^2) = 13$ , find (a) $\text{Var}(X)$ ; (b) $E\big((X-3)^2\big)$ ; (c) $\text{Var}(4 - 3X)$ .

(a) $13 - 9 = 4.$ (b) $E((X-\mu)^2) = \text{Var}(X) = 4.$ (c) $(-3)^2(4) = 36.$

P3. A fair four-sided die shows $1,2,3,4$ . Let $X$ be the score and $Y = X^2$ . Find $E(Y)$ and $\text{Var}(Y)$ .

$E(Y) = E(X^2) = \frac14(1+4+9+16) = \frac{30}{4} = 7.5.$ $E(Y^2) = E(X^4) = \frac14(1+16+81+256) = \frac{354}{4} = 88.5.$ $\text{Var}(Y) = 88.5 - 7.5^2 = 88.5 - 56.25 = 32.25.$

P4. $X$ takes values $1, 2, 3$ with probabilities $p, q, p$ (symmetric). Given $\text{Var}(X) = 0.5$ , find $p$ and $q$ .

By symmetry $E(X) = 2$ . Normalisation: $2p + q = 1$ . $E(X^2) = p + 4q + 9p = 10p + 4q$ . Then $\text{Var}(X) = 10p + 4q - 4 = 0.5$ , so $10p + 4q = 4.5$ . Subtract $4(2p+q)=4$ : $2p = 0.5 \Rightarrow p = 0.25$ , $q = 0.5$ .

P5. The number of heads $X$ in two tosses of a biased coin ( $P(\text{head}) = 0.6$ ) has distribution $P(0)=0.16,\ P(1)=0.48,\ P(2)=0.36$ . Verify $E(X) = 1.2$ and find $\text{Var}(X)$ ; compare with $np$ and $np(1-p)$ .

$E(X) = 0(0.16) + 1(0.48) + 2(0.36) = 1.2 = np = 2(0.6).$ $E(X^2) = 0 + 0.48 + 4(0.36) = 1.92.$ $\text{Var}(X) = 1.92 - 1.44 = 0.48 = np(1-p) = 2(0.6)(0.4).$ The table reproduces the binomial moments exactly.

12. Board-alignment footer

Aligned to AQA A-Level Further Mathematics 7367, Paper 3 Statistics (7367/3S): discrete random variables, $E(X)$ , $E(g(X))$ , $\text{Var}(X) = E(X^2) - (E(X))^2$ , and the linear-transformation algebra. The same content appears in Edexcel Further Statistics 1 (discrete random variables, $E(X)$ / $\text{Var}(X)$ of linear functions) and OCR(A)/OCR(MEI) Statistics modules; the formulae and conventions are identical across boards.

13. Visual summary

\boxed{\;E(X) = \sum_x x\,P(X=x) \quad\bullet\quad \text{Var}(X) = E(X^2) - \big(E(X)\big)^2 \quad\bullet\quad \text{Var}(aX+b) = a^2\text{Var}(X)\;}

Quantity	Discrete formula	Key behaviour under $aX+b$
$E(X)$	$\sum x\,P(X=x)$	$aE(X)+b$ (linear)
$E(g(X))$	$\sum g(x)\,P(X=x)$	apply $g$ then weight
$\text{Var}(X)$	$E(X^2)-(E(X))^2$	$a^2\text{Var}(X)$ (shift ignored)
$\sigma$	$\sqrt{\text{Var}(X)}$	$\lvert a\rvert\,\sigma$

graph LR
  A["Distribution table x, P(X=x)"] --> B["E(X) = sum x P"]
  A --> C["E(X^2) = sum x^2 P"]
  B --> D["Var(X) = E(X^2) - (E(X))^2"]
  C --> D
  D --> E["SD = sqrt(Var)"]
  B --> F["Transform: E(aX+b)=aE(X)+b"]
  D --> G["Transform: Var(aX+b)=a^2 Var(X)"]

Recap: build a clean table, compute $E(X)$ and $E(X^2)$ , then $\text{Var}(X) = E(X^2) - (E(X))^2$ . Expectation is linear; variance squares the scale and ignores shifts. Master this and every distribution in 7367/3S becomes a special case.

Discrete Random Variables (Further)

Discrete Random Variables (Further)

1. Where this sits in AQA 7367

2. Core theory: expectation and variance

Expectation

Expectation of a function of XXX

Variance

The algebra of EEE and Var\text{Var}Var

Sums and combinations of independent variables

When variables are not independent: covariance

Mode, median and the discrete CDF

3. Worked examples with M1/A1 mark schemes

Example 1 — full table analysis

Example 2 — the variance algebra under transformation

Example 3 — recovering a distribution from its moments

4. Specimen-style exam question

5. Synoptic links

6. Mark-scheme literacy

7. Grade-band model answers

8. Common misconceptions

9. Common errors

10. Going further (STEP / MAT / Oxbridge)

11. Additional A* practice (with worked answers)

12. Board-alignment footer

13. Visual summary

More in Mathematics

Expectation of a function of $X$

The algebra of $E$ and $\text{Var}$