You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
There is a surprising, almost magical, fact that ties together everything in this course: every square matrix satisfies its own characteristic equation. You compute the characteristic polynomial p(λ)=det(A−λI) — the very polynomial whose roots are the eigenvalues — and then, astonishingly, if you substitute the matrix A in place of the scalar λ, you get the zero matrix: p(A)=O. This is the Cayley–Hamilton theorem, and it is far more than a curiosity. It gives a slick way to find the inverse of a matrix, a powerful method for reducing high powers An to low-degree expressions in A, and a route to An that works even for matrices that cannot be diagonalised (Lesson 9's defective case). The theorem is the natural capstone of the matrices strand: it links the determinant, the characteristic polynomial, eigenvalues, inverses and powers into a single elegant identity. This lesson states it carefully, derives the indispensable 2×2 form, and shows its three big applications — inverses, powers, and polynomial reduction — with fully verified worked examples and the exam technique examiners reward.
Cayley–Hamilton is compulsory pure content for Papers 1 and 2 and the final topic of the matrices strand. Stating and verifying the theorem for a given matrix is AO1/AO2; using it to find A−1 or to reduce a power is AO1; and the multi-step "express An in the form αA+βI" or "use Cayley–Hamilton to evaluate a matrix polynomial" problems are AO3. The topic rests on the characteristic polynomial (Lesson 8), determinants (Lesson 3), inverses (Lesson 4) and powers/diagonalisation (Lesson 9) — it is genuinely synoptic within the strand.
Let A be an n×n matrix with characteristic polynomial
p(λ)=det(A−λI)=(−1)nλn+cn−1λn−1+⋯+c1λ+c0.The Cayley–Hamilton theorem states:
p(A)=(−1)nAn+cn−1An−1+⋯+c1A+c0I=O.In words: substitute the matrix A for λ throughout the characteristic polynomial — interpreting the constant term c0 as c0I — and the result is the zero matrix. The single most common error is to forget that the constant becomes c0I (not the scalar c0): you are building a matrix equation, so every term must be a matrix.
A warning about the "obvious" wrong proof. It is tempting to argue "p(λ)=det(A−λI), so p(A)=det(A−A)=det(O)=0." This is nonsense: p(A) is an n×n matrix, not the scalar det(A−AI), and the substitution of a matrix into a polynomial is a completely different operation from evaluating a determinant. The genuine theorem is much deeper than this fake one-liner — see §12.
For a 2×2 matrix A, the characteristic polynomial is (Lesson 8)
p(λ)=λ2−(trA)λ+detA.Cayley–Hamilton therefore gives the identity every Further-Maths student should have at their fingertips:
A2−(trA)A+(detA)I=O,equivalentlyA2=(trA)A−(detA)I.The rearranged form is the workhorse: it expresses A2 as a linear combination of A and I — and, by repeated substitution, so is every higher power. That single fact powers both the inverse trick and the power-reduction method below.
Verify the theorem for A=(1324).
trA=5, detA=4−6=−2, so the characteristic equation is λ2−5λ−2=0 and Cayley–Hamilton claims A2−5A−2I=O. Compute:
A2=(1324)(1324)=(7151022),5A=(5151020),2I=(2002). A2−5A−2I=(7−5−215−15−010−10−022−20−2)=(0000)=O. ✓Mark scheme: M1 find trA and detA / state characteristic equation; M1 compute A2; M1 form A2−5A−2I (constant term as −2I); A1 obtain O. Treating the constant as the scalar −2 instead of −2I is the standard error and loses the structural mark.
Start from the 2×2 identity and isolate I. From A2−(trA)A+(detA)I=O,
(detA)I=(trA)A−A2=A((trA)I−A).If detA=0, divide by detA and read off the inverse:
A−1=detA(trA)I−A.This is a genuinely useful alternative to the cofactor method, and it generalises: for an n×n matrix, Cayley–Hamilton expresses A−1 as a polynomial in A of degree n−1.
Use Cayley–Hamilton to find A−1 for A=(1324).
trA=5, detA=−2, so
A−1=−25I−A=−21(5−10−30−25−4)=−21(4−3−21)=(−23/21−1/2).Check: AA−1=(1324)(−23/21−1/2)=(−2+3−6+61−13−2)=(1001). ✓
Mark scheme: M1 quote/derive A−1=((trA)I−A)/detA; M1 substitute trA=5,detA=−2; A1 form 5I−A; A1 A−1=(3/2−1/2−2 1). The AA−1=I check secures the accuracy mark.
Because A2=(trA)A−(detA)I is a linear combination of A and I, so is A3=A⋅A2, and so on: every power of a 2×2 matrix can be written as αA+βI for some scalars. To climb from one power to the next, multiply the current expression by A and use the relation again to eliminate the A2 that appears.
Express A3 for A=(1324) in the form αA+βI, and hence evaluate it.
From Worked Example 1, A2=5A+2I. Then
A3=A⋅A2=A(5A+2I)=5A2+2A=5(5A+2I)+2A=27A+10I.So α=27, β=10, and
A3=27(1324)+10(1001)=(27+108154108+10)=(378154118).Check by direct multiplication: A3=A2A=(7151022)(1324)=(378154118) ✓.
Mark scheme: M1 use A2=5A+2I; M1 substitute to eliminate A2 from A3; A1 A3=27A+10I; A1 evaluate to (81 11837 54). The reduction A3=αA+βI is the examinable skill; the numerical check protects the final mark.
A faster route for a specific high power uses the eigenvalues directly. Since An=αnA+βnI, the same scalars satisfy the scalar equation λn=αnλ+βn at each eigenvalue λ (because the eigenvalues satisfy the same characteristic relation). With two distinct eigenvalues you get two equations for αn,βn; solve and you have An without iterating. (For a repeated eigenvalue, differentiate λn=αnλ+βn with respect to λ to get the second equation.)
The same idea evaluates any polynomial in A. Asked for f(A) where f has degree ≥2 (for a 2×2), use the characteristic relation to keep replacing A2 until only A and I remain: every polynomial in a 2×2 matrix collapses to αA+βI. Concretely, treat the polynomial as an ordinary polynomial in λ, divide it by the characteristic polynomial, and the remainder (degree <2) is exactly αλ+β; then f(A)=αA+βI. This polynomial-division viewpoint is the cleanest way to handle a high-degree request such as "find A7−3A5+2I".
As a quick illustration, suppose A satisfies A2=4A−3I (eigenvalues 1 and 3) and you must evaluate f(A)=A3−2A2+A. The eigenvalue shortcut is fastest: f(A)=γA+δI where f(λ)=γλ+δ holds at λ=1 and λ=3. At λ=1: 1−2+1=0=γ+δ; at λ=3: 27−18+3=12=3γ+δ. Subtracting, 2γ=12⇒γ=6, δ=−6, so f(A)=6A−6I. No matrix multiplication is needed at all — the eigenvalues do the work. (You could equally reduce step by step: A3=13A−12I from before, so f(A)=(13A−12I)−2(4A−3I)+A=6A−6I ✓, agreeing exactly.)
For a 3×3 matrix the characteristic polynomial is a cubic, λ3+aλ2+bλ+c (after fixing signs), and Cayley–Hamilton gives a relation between A3,A2,A and I:
A3+aA2+bA+cI=O.Verify the theorem for A=201010102.
Expand det(A−λI) along the middle row (only one non-zero entry, 1−λ):
det(A−λI)=(1−λ)2−λ112−λ=(1−λ)[(2−λ)2−1].Now (2−λ)2−1=λ2−4λ+3=(λ−1)(λ−3), so
det(A−λI)=(1−λ)(λ−1)(λ−3)=−(λ−1)2(λ−3)=−λ3+5λ2−7λ+3.(Check: eigenvalues 1,1,3 give sum 5=trA ✓ and product 3=detA ✓.) Cayley–Hamilton then asserts −A3+5A2−7A+3I=O, i.e. A3=5A2−7A+3I. Computing the powers, A2=504010405 and A3=1401301013014; then 5A2−7A+3I=25−14+3020−705−7+3020−7025−14+3=1401301013014=A3 ✓.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.