You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
This lesson turns the PDF into a full toolkit: from a density f(x) you will compute the mean, variance, mode, median, quartiles and percentiles, exploit symmetry, and apply linear transformations. These are the continuous analogues of the discrete summaries from Lesson 1, and they are the workhorse calculations of every 7367/3S continuous-distribution question.
This is Paper 3 Statistics option (7367/3S) content (AO weighting AO1 40% / AO2 25% / AO3 35%). Evaluating ∫xfdx and ∫x2fdx is AO1; solving ∫amf=0.5 for a median, or maximising f for a mode, is AO3; comparing mean/median/mode to describe skew is AO2. The prerequisite is A-Level Mathematics integration and differentiation, plus the discrete E/Var algebra (Lesson 1) and the PDF definition (Lesson 4).
For a continuous X with PDF f,
E(X)=μ=∫−∞∞xf(x)dx,E(g(X))=∫−∞∞g(x)f(x)dx.
As in the discrete case, E(g(X)) does not require the distribution of g(X) — multiply g(x) by the density and integrate. The most important case is g(x)=x2.
Var(X)=E(X2)−(E(X))2=∫x2fdx−(∫xfdx)2,SD(X)=Var(X).
If f is symmetric about x=c, then E(X)=c and the median =c without integrating — a major time-saver. (If f is also single-peaked, the mode is c too.)
For Y=aX+b:
| Quantity | Formula |
|---|---|
| E(Y) | aE(X)+b |
| Var(Y) | a2Var(X) |
| SD(Y) | ∣a∣SD(X) |
Identical to the discrete algebra of Lesson 1 — expectation is linear, variance squares the scale and ignores the shift.
It is instructive to compute every summary measure for a single PDF, because exam questions often chain them and because seeing all the numbers together builds intuition for how mean, median and mode relate. Take
f(x)=91x2for 0≤x≤3,f(x)=0 otherwise.
Validity. f≥0 on [0,3], and ∫0391x2dx=91[3x3]03=91⋅9=1. Valid.
Mean. E(X)=91∫03x3dx=91[4x4]03=91⋅481=49=2.25.
Second moment and variance. E(X2)=91∫03x4dx=91[5x5]03=91⋅5243=527=5.4. Hence Var(X)=5.4−2.252=5.4−5.0625=0.3375 and SD(X)=0.3375=0.581.
Mode. f′(x)=92x>0 on (0,3], so f increases throughout and the mode is the right endpoint, x=3.
Median. ∫0m91x2dx=27m3=0.5⇒m3=13.5⇒m=313.5≈2.381.
Quartiles. 27Q13=0.25⇒Q1=36.75≈1.890; 27Q33=0.75⇒Q3=320.25≈2.727; so IQR=2.727−1.890=0.837.
Skew. Mean 2.25< median 2.381< mode 3: the mass piles toward the upper end with a longer left tail, so the distribution is negatively (left) skewed — the same qualitative picture as 83x2 on [0,2], as you would expect from the identical rising-parabola shape.
A transformation. If a related quantity is Y=10−2X, then E(Y)=10−2(2.25)=5.5 and Var(Y)=(−2)2(0.3375)=1.35; the reflection caused by the negative coefficient turns the negative skew of X into a positive skew of Y, even though the variance is unchanged. Working a single density through to all of its summaries like this is excellent revision and mirrors the structure of the longer exam questions.
The median and quartiles are special cases of percentiles, and a common exam phrasing asks for "the value xp below which a given proportion of the distribution lies". The defining equation is always the same area condition,
∫−∞xpf(x)dx=100p,
so the median is the 50th percentile, Q1 the 25th and Q3 the 75th. The technique is to integrate up to the unknown limit, set the result equal to the required proportion, and solve.
For the density f(x)=91x2 on [0,3] used above, the accumulated area to x is 27x3, so the p-th percentile solves 27xp3=100p, giving xp=33p/100. The 10th percentile is therefore x10=330.1=3(0.4642)=1.393: only 10% of the probability lies below x≈1.39, reflecting how the rising density starves the left of the interval. The 90th percentile is x90=330.9=3(0.9655)=2.897, very close to the upper limit 3 — consistent with the heavy concentration of probability near the right end.
A subtle reading point: "the value exceeded by 20% of the distribution" is the 80th percentile, not the 20th, because 80% lies below it. Phrases such as "the top 5%", "the slowest 10%" or "the threshold beaten by a quarter of candidates" all translate into a percentile via "what fraction lies below?". Mistranslating the direction is a frequent and avoidable error; a quick sketch with the shaded region labelled as a percentage removes any ambiguity.
The densities above had their modes at an endpoint because they were monotonic. A density with an interior maximum needs the full stationary-point method, and it is worth a careful worked example. Consider the cubic density
f(x)=43x2(2−x)for 0≤x≤2,
which is a valid PDF — ∫0243x2(2−x)dx=43[32x3−4x4]02=43(316−4)=43⋅34=1 — that rises from zero, peaks inside the interval, and falls back to zero at x=2. To locate the mode, differentiate:
f(x)=43(2x2−x3),f′(x)=43(4x−3x2)=43x(4−3x).
Setting f′(x)=0 on (0,2) gives x=34 (rejecting x=0, the left endpoint where f=0). The second derivative f′′(x)=43(4−6x) is negative at x=34 (since 4−8=−4<0), confirming a maximum: the mode is x=34≈1.333. Crucially, one must still check the endpoints — but here f(0)=0 and f(2)=0, both below the interior peak f(34)=43⋅916⋅32=98≈0.89, so the interior stationary point genuinely wins.
This example also underlines why mean, median and mode usually differ. They answer three different questions — the balance point (mean), the area-splitting value (median), and the peak (mode) — and they coincide only for a symmetric, single-peaked distribution, where all three sit at the centre of symmetry. The normal distribution is the canonical case: its perfect bell shape forces mean = median = mode. For any skewed density, the three separate, and their order encodes the direction of skew, as established earlier. Being able to compute all three and then explain their relationship is exactly the synthesis the longer 3S questions are designed to reward.
When a density is symmetric, the savings are dramatic, and it is worth seeing them deployed end to end. Consider
f(x)=43(1−(x−2)2)for 1≤x≤3,f(x)=0 otherwise,
an inverted parabola centred on x=2. First confirm it is a valid PDF. Substituting u=x−2 (so du=dx, limits −1 to 1),
∫1343(1−(x−2)2)dx=43∫−11(1−u2)du=43[u−3u3]−11=43⋅34=1,
using the symmetry of 1−u2 to double the integral over [0,1]. Because f is symmetric about x=2, the mean and median are both 2 with no integration at all, and since the curve has a single peak there, the mode is also 2 — all three averages coincide, signalling a symmetric distribution. Only the variance requires work, and even that simplifies: Var(X)=E((X−2)2) because the mean is 2, so
Var(X)=43∫−11u2(1−u2)du=43⋅2∫01(u2−u4)du=23(31−51)=23⋅152=51=0.2.
Without the symmetry insight you would have computed E(X) and E(X2) by two separate integrals and then subtracted; recognising the centre of symmetry collapses the mean, median and mode to a single observation and reframes the variance as a central second moment, halving the labour and the error risk. Spotting symmetry — a (x−c)2 term, an even function after shifting, a mirror-image graph — is one of the highest-value habits in the continuous-distribution toolkit.
Quartiles are found by exactly the median technique applied at 0.25 and 0.75, and a clean worked example fixes the method. Let f(x)=21x for 0≤x≤2, a rising linear density (a triangle). The accumulated area to x is
∫0x21tdt=[4t2]0x=4x2,
so the general quantile equation is 4x2=p, giving x=2p. Hence the lower quartile solves 4Q12=0.25, so Q12=1 and Q1=1; the median solves 4m2=0.5, so m=2≈1.414; and the upper quartile solves 4Q32=0.75, so Q3=3≈1.732. The interquartile range is therefore
IQR=Q3−Q1=3−1≈0.732.
Two checks are worth internalising. First, the quartiles are correctly ordered, Q1<m<Q3, and all lie within the support [0,2] — if a "quartile" ever fell outside the support, a slip has occurred. Second, the spacing is uneven: Q1=1 sits a full unit from the left end but Q3=1.732 is only 0.27 from the right end, reflecting how the rising density crowds probability toward the upper values (the same negative-skew signature as the parabolic densities). Comparing the mean (E(X)=21∫02x2dx=21⋅38=34≈1.333) with the median 1.414 confirms mean < median, the expected ordering for a left-skewed distribution. Setting up the single quantile equation 4x2=p once, then substituting p=0.25,0.5,0.75, is the efficient way to handle a question that asks for several quantiles at once.
Let f(x)=83x2 for 0≤x≤2.
E(X)=∫02x⋅83x2dx=83[4x4]02=83×4=23=1.5. (M1 set up ∫xfdx; A1 E(X)=1.5.)
E(X2)=∫02x2⋅83x2dx=83[5x5]02=83×532=512=2.4. (M1 ∫x2fdx; A1 E(X2)=2.4.)
Var(X)=2.4−1.52=2.4−2.25=0.15,SD(X)=0.15=0.387. (M1 apply E(X2)−(E(X))2; A1 Var(X)=0.15.)
For the same f(x)=83x2 on [0,2]:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.