The Cayley-Hamilton Theorem

There is a surprising, almost magical, fact that ties together everything in this course: every square matrix satisfies its own characteristic equation. You compute the characteristic polynomial $p(\lambda) = \det(A-\lambda I)$ — the very polynomial whose roots are the eigenvalues — and then, astonishingly, if you substitute the matrix $A$ in place of the scalar $\lambda$ , you get the zero matrix: $p(A) = O$ . This is the Cayley–Hamilton theorem, and it is far more than a curiosity. It gives a slick way to find the inverse of a matrix, a powerful method for reducing high powers $A^n$ to low-degree expressions in $A$ , and a route to $A^n$ that works even for matrices that cannot be diagonalised (Lesson 9's defective case). The theorem is the natural capstone of the matrices strand: it links the determinant, the characteristic polynomial, eigenvalues, inverses and powers into a single elegant identity. This lesson states it carefully, derives the indispensable $2\times2$ form, and shows its three big applications — inverses, powers, and polynomial reduction — with fully verified worked examples and the exam technique examiners reward.

1. Where this sits in AQA 7367

Cayley–Hamilton is compulsory pure content for Papers 1 and 2 and the final topic of the matrices strand. Stating and verifying the theorem for a given matrix is AO1/AO2; using it to find $A^{-1}$ or to reduce a power is AO1; and the multi-step "express $A^n$ in the form $\alpha A + \beta I$ " or "use Cayley–Hamilton to evaluate a matrix polynomial" problems are AO3. The topic rests on the characteristic polynomial (Lesson 8), determinants (Lesson 3), inverses (Lesson 4) and powers/diagonalisation (Lesson 9) — it is genuinely synoptic within the strand.

2. Statement of the theorem

Let $A$ be an $n\times n$ matrix with characteristic polynomial

p(\lambda) = \det(A - \lambda I) = (-1)^n\lambda^n + c_{n-1}\lambda^{n-1} + \cdots + c_1\lambda + c_0.

The Cayley–Hamilton theorem states:

\boxed{\;p(A) = (-1)^n A^n + c_{n-1}A^{n-1} + \cdots + c_1 A + c_0 I = O.\;}

In words: substitute the matrix $A$ for $\lambda$ throughout the characteristic polynomial — interpreting the constant term $c_0$ as $c_0 I$ — and the result is the zero matrix. The single most common error is to forget that the constant becomes $c_0 I$ (not the scalar $c_0$ ): you are building a matrix equation, so every term must be a matrix.

A warning about the "obvious" wrong proof. It is tempting to argue " $p(\lambda) = \det(A-\lambda I)$ , so $p(A) = \det(A - A) = \det(O) = 0$ ." This is nonsense: $p(A)$ is an $n\times n$ matrix, not the scalar $\det(A-AI)$ , and the substitution of a matrix into a polynomial is a completely different operation from evaluating a determinant. The genuine theorem is much deeper than this fake one-liner — see §12.

3. The 2×2 form (derived) — the one to know cold

For a $2\times2$ matrix $A$ , the characteristic polynomial is (Lesson 8)

p(\lambda) = \lambda^2 - (\operatorname{tr}A)\lambda + \det A.

Cayley–Hamilton therefore gives the identity every Further-Maths student should have at their fingertips:

\boxed{\;A^2 - (\operatorname{tr}A)\,A + (\det A)\,I = O,\qquad\text{equivalently}\qquad A^2 = (\operatorname{tr}A)\,A - (\det A)\,I.\;}

The rearranged form is the workhorse: it expresses $A^2$ as a linear combination of $A$ and $I$ — and, by repeated substitution, so is every higher power. That single fact powers both the inverse trick and the power-reduction method below.

Worked Example 1 — verify Cayley–Hamilton for a 2×2 (with mark scheme)

Verify the theorem for $A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$ .

$\operatorname{tr}A = 5$ , $\det A = 4 - 6 = -2$ , so the characteristic equation is $\lambda^2 - 5\lambda - 2 = 0$ and Cayley–Hamilton claims $A^2 - 5A - 2I = O$ . Compute:

A^2 = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} = \begin{pmatrix} 7 & 10 \\ 15 & 22 \end{pmatrix},\quad 5A = \begin{pmatrix} 5 & 10 \\ 15 & 20 \end{pmatrix},\quad 2I = \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}.

A^2 - 5A - 2I = \begin{pmatrix} 7-5-2 & 10-10-0 \\ 15-15-0 & 22-20-2 \end{pmatrix} = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix} = O.\ ✓

Mark scheme: M1 find $\operatorname{tr}A$ and $\det A$ / state characteristic equation; M1 compute $A^2$ ; M1 form $A^2 - 5A - 2I$ (constant term as $-2I$ ); A1 obtain $O$ . Treating the constant as the scalar $-2$ instead of $-2I$ is the standard error and loses the structural mark.

4. Application 1 — finding the inverse

Start from the $2\times2$ identity and isolate $I$ . From $A^2 - (\operatorname{tr}A)A + (\det A)I = O$ ,

(\det A)\,I = (\operatorname{tr}A)\,A - A^2 = A\bigl((\operatorname{tr}A)\,I - A\bigr).

If $\det A \neq 0$ , divide by $\det A$ and read off the inverse:

\boxed{\;A^{-1} = \frac{(\operatorname{tr}A)\,I - A}{\det A}.\;}

This is a genuinely useful alternative to the cofactor method, and it generalises: for an $n\times n$ matrix, Cayley–Hamilton expresses $A^{-1}$ as a polynomial in $A$ of degree $n-1$ .

Worked Example 2 — inverse via Cayley–Hamilton (with mark scheme)

Use Cayley–Hamilton to find $A^{-1}$ for $A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$ .

$\operatorname{tr}A = 5$ , $\det A = -2$ , so

A^{-1} = \frac{5I - A}{-2} = \frac{1}{-2}\begin{pmatrix} 5-1 & 0-2 \\ 0-3 & 5-4 \end{pmatrix} = \frac{1}{-2}\begin{pmatrix} 4 & -2 \\ -3 & 1 \end{pmatrix} = \begin{pmatrix} -2 & 1 \\ 3/2 & -1/2 \end{pmatrix}.

Check: $AA^{-1} = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}\begin{pmatrix} -2 & 1 \\ 3/2 & -1/2 \end{pmatrix} = \begin{pmatrix} -2+3 & 1-1 \\ -6+6 & 3-2 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}.\ ✓$

Mark scheme: M1 quote/derive $A^{-1} = ((\operatorname{tr}A)I - A)/\det A$ ; M1 substitute $\operatorname{tr}A = 5,\det A = -2$ ; A1 form $5I - A$ ; A1 $A^{-1} = \binom{-2\ \ 1}{3/2\,-1/2}$ . The $AA^{-1}=I$ check secures the accuracy mark.

5. Application 2 — computing higher powers

Because $A^2 = (\operatorname{tr}A)A - (\det A)I$ is a linear combination of $A$ and $I$ , so is $A^3 = A\cdot A^2$ , and so on: every power of a $2\times2$ matrix can be written as $\alpha A + \beta I$ for some scalars. To climb from one power to the next, multiply the current expression by $A$ and use the relation again to eliminate the $A^2$ that appears.

Worked Example 3 — reduce $A^3$ to $\alpha A + \beta I$ (with mark scheme)

Express $A^3$ for $A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$ in the form $\alpha A + \beta I$ , and hence evaluate it.

From Worked Example 1, $A^2 = 5A + 2I$ . Then

A^3 = A\cdot A^2 = A(5A + 2I) = 5A^2 + 2A = 5(5A + 2I) + 2A = 27A + 10I.

So $\alpha = 27,\ \beta = 10$ , and

A^3 = 27\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} + 10\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = \begin{pmatrix} 27+10 & 54 \\ 81 & 108+10 \end{pmatrix} = \begin{pmatrix} 37 & 54 \\ 81 & 118 \end{pmatrix}.

Check by direct multiplication: $A^3 = A^2 A = \begin{pmatrix} 7 & 10 \\ 15 & 22 \end{pmatrix}\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} = \begin{pmatrix} 37 & 54 \\ 81 & 118 \end{pmatrix}$ ✓.

Mark scheme: M1 use $A^2 = 5A + 2I$ ; M1 substitute to eliminate $A^2$ from $A^3$ ; A1 $A^3 = 27A + 10I$ ; A1 evaluate to $\binom{37\ \ 54}{81\ 118}$ . The reduction $A^3 = \alpha A + \beta I$ is the examinable skill; the numerical check protects the final mark.

A faster route for a specific high power uses the eigenvalues directly. Since $A^n = \alpha_n A + \beta_n I$ , the same scalars satisfy the scalar equation $\lambda^n = \alpha_n\lambda + \beta_n$ at each eigenvalue $\lambda$ (because the eigenvalues satisfy the same characteristic relation). With two distinct eigenvalues you get two equations for $\alpha_n,\beta_n$ ; solve and you have $A^n$ without iterating. (For a repeated eigenvalue, differentiate $\lambda^n = \alpha_n\lambda + \beta_n$ with respect to $\lambda$ to get the second equation.)

6. Application 3 — reducing polynomial expressions

The same idea evaluates any polynomial in $A$ . Asked for $f(A)$ where $f$ has degree $\ge 2$ (for a $2\times2$ ), use the characteristic relation to keep replacing $A^2$ until only $A$ and $I$ remain: every polynomial in a $2\times2$ matrix collapses to $\alpha A + \beta I$ . Concretely, treat the polynomial as an ordinary polynomial in $\lambda$ , divide it by the characteristic polynomial, and the remainder (degree $< 2$ ) is exactly $\alpha\lambda + \beta$ ; then $f(A) = \alpha A + \beta I$ . This polynomial-division viewpoint is the cleanest way to handle a high-degree request such as "find $A^7 - 3A^5 + 2I$ ".

As a quick illustration, suppose $A$ satisfies $A^2 = 4A - 3I$ (eigenvalues $1$ and $3$ ) and you must evaluate $f(A) = A^3 - 2A^2 + A$ . The eigenvalue shortcut is fastest: $f(A) = \gamma A + \delta I$ where $f(\lambda) = \gamma\lambda + \delta$ holds at $\lambda = 1$ and $\lambda = 3$ . At $\lambda=1$ : $1 - 2 + 1 = 0 = \gamma + \delta$ ; at $\lambda=3$ : $27 - 18 + 3 = 12 = 3\gamma + \delta$ . Subtracting, $2\gamma = 12\Rightarrow\gamma = 6$ , $\delta = -6$ , so $f(A) = 6A - 6I$ . No matrix multiplication is needed at all — the eigenvalues do the work. (You could equally reduce step by step: $A^3 = 13A - 12I$ from before, so $f(A) = (13A - 12I) - 2(4A - 3I) + A = 6A - 6I$ ✓, agreeing exactly.)

7. The 3×3 case

For a $3\times3$ matrix the characteristic polynomial is a cubic, $\lambda^3 + a\lambda^2 + b\lambda + c$ (after fixing signs), and Cayley–Hamilton gives a relation between $A^3,A^2,A$ and $I$ :

A^3 + aA^2 + bA + cI = O.

Worked Example 4 — verify Cayley–Hamilton for a 3×3 (with mark scheme)

Verify the theorem for $A = \begin{pmatrix} 2 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 2 \end{pmatrix}$ .

Expand $\det(A-\lambda I)$ along the middle row (only one non-zero entry, $1-\lambda$ ):

\det(A-\lambda I) = (1-\lambda)\begin{vmatrix} 2-\lambda & 1 \\ 1 & 2-\lambda \end{vmatrix} = (1-\lambda)\bigl[(2-\lambda)^2 - 1\bigr].

Now $(2-\lambda)^2 - 1 = \lambda^2 - 4\lambda + 3 = (\lambda-1)(\lambda-3)$ , so

\det(A-\lambda I) = (1-\lambda)(\lambda-1)(\lambda-3) = -(\lambda-1)^2(\lambda-3) = -\lambda^3 + 5\lambda^2 - 7\lambda + 3.

(Check: eigenvalues $1,1,3$ give sum $5 = \operatorname{tr}A$ ✓ and product $3 = \det A$ ✓.) Cayley–Hamilton then asserts $-A^3 + 5A^2 - 7A + 3I = O$ , i.e. $A^3 = 5A^2 - 7A + 3I$ . Computing the powers, $A^2 = \begin{pmatrix} 5 & 0 & 4 \\ 0 & 1 & 0 \\ 4 & 0 & 5 \end{pmatrix}$ and $A^3 = \begin{pmatrix} 14 & 0 & 13 \\ 0 & 1 & 0 \\ 13 & 0 & 14 \end{pmatrix}$ ; then $5A^2 - 7A + 3I = \begin{pmatrix} 25-14+3 & 0 & 20-7 \\ 0 & 5-7+3 & 0 \\ 20-7 & 0 & 25-14+3 \end{pmatrix} = \begin{pmatrix} 14 & 0 & 13 \\ 0 & 1 & 0 \\ 13 & 0 & 14 \end{pmatrix} = A^3$ ✓.

The Cayley-Hamilton Theorem

The Cayley-Hamilton Theorem

1. Where this sits in AQA 7367

2. Statement of the theorem

3. The 2×2 form (derived) — the one to know cold

Worked Example 1 — verify Cayley–Hamilton for a 2×2 (with mark scheme)

4. Application 1 — finding the inverse

Worked Example 2 — inverse via Cayley–Hamilton (with mark scheme)

5. Application 2 — computing higher powers

Worked Example 3 — reduce A3A^3A3 to αA+βI\alpha A + \beta IαA+βI (with mark scheme)

6. Application 3 — reducing polynomial expressions

7. The 3×3 case

Worked Example 4 — verify Cayley–Hamilton for a 3×3 (with mark scheme)

More in Mathematics

Worked Example 3 — reduce $A^3$ to $\alpha A + \beta I$ (with mark scheme)