Optimisation and Heuristics

A great many problems in computing are optimisation problems: they ask not merely for a solution but for the best one — the shortest route, the lowest cost, the maximum profit, the tightest packing. For small inputs we can simply examine every possibility and pick the best. But for many important optimisation problems the number of possibilities explodes so violently with input size that exhaustive search becomes physically impossible — not slow, but impossible, on any conceivable hardware, for inputs of quite modest size. Such problems are intractable. When faced with one, we abandon the demand for a provably optimal answer and instead use a heuristic: a strategy that finds a good enough solution in a reasonable time. This lesson explains the dividing line between tractable and intractable problems, why brute force fails on the wrong side of that line, and the main families of heuristic methods used to cope — with the travelling salesman problem as the motivating example throughout.

Spec Mapping

This lesson addresses AQA A-Level Computer Science (7517), Section 4.4 Theory of computation, specifically 4.4.4 Classification of algorithms (tractable vs intractable problems, and the use of heuristic methods when exact solutions are infeasible). It also connects to 4.3.6 Optimisation algorithms, since Dijkstra's and A* are optimisation algorithms and A*'s heuristic is a worked instance of the heuristic idea. It depends on the Big-O complexity-class vocabulary of 4.4.4 (polynomial vs exponential vs factorial) and links to the limits-of-computation material (computable vs non-computable problems) in the same section.

What Is an Optimisation Problem?

An optimisation problem is defined by three ingredients:

A search space — the set of all candidate solutions.
An objective function — a rule assigning a numerical value (cost, length, profit, …) to each candidate.
A goal — to minimise or maximise the objective function over the search space.

The Travelling Salesman Problem (TSP)

The TSP is the textbook optimisation problem and recurs throughout this lesson:

Input: a set of cities and the distance between every pair.
Objective: the total length of a tour — a route that starts at one city, visits every city exactly once, and returns to the start.
Goal: find the tour of minimum total length.
Search space: every possible ordering of the cities. Fixing the start city, there are $(n-1)!$ distinct tours.

The trouble is the size of that search space. For $n = 20$ cities there are roughly $20! \approx 2.4 \times 10^{18}$ orderings. A computer checking one billion ( $10^9$ ) tours per second would need

$\frac{2.4 \times 10^{18}}{10^{9}} = 2.4 \times 10^{9}\text{ seconds} \approx 76\text{ years}$

to examine them all — for just twenty cities. Add a handful more and the time exceeds the age of the universe. This is the face of intractability.

Tractable, Intractable, Computable and Non-computable

The specification draws careful distinctions between four classifications. Keeping them separate is a frequent source of marks.

Classification	Definition	Example
Tractable	A problem for which an algorithm exists that runs in polynomial time — $O(n^k)$ for some constant $k$	Sorting, $O(n\log n)$ ; searching, $O(\log n)$
Intractable	A problem that is solvable in principle, but for which no polynomial-time algorithm is known — the best known are exponential or factorial	TSP, $O(n!)$ by brute force
Computable	A problem that can be solved by an algorithm at all (it may take impractically long, but it terminates with the right answer)	TSP is computable — brute force does terminate, eventually
Non-computable	A problem that no algorithm can ever solve, however much time is allowed	The Halting Problem

Two relationships are worth stressing. First, intractable problems are still computable — the TSP can be solved exactly, just not quickly; intractability is about time, not possibility. Non-computability is the stronger barrier: the Halting Problem cannot be solved by any algorithm, no matter how slow. Second, tractability is about the existence of a polynomial-time algorithm, not about ease: a problem is formally tractable even if its best algorithm is $O(n^{100})$ , which would be useless in practice. The classification is a theoretical line, and you should be ready to say so.

Exam Tip: "Tractable" does not mean "easy" or "fast" — it means "solvable in polynomial time". $O(n^{100})$ is technically tractable yet hopelessly impractical, while $O(2^n)$ is intractable yet fine for $n=10$ . In practice we treat $O(n^2)$ or $O(n^3)$ as "reasonable" and exponential/factorial as "infeasible", but the definition is the polynomial boundary.

Why Brute Force Fails for Intractable Problems

The reason brute force collapses is the chasm between polynomial and exponential/factorial growth. The table below shows the number of basic operations each order demands as $n$ grows:

$n$	$n^2$ (polynomial)	$2^n$ (exponential)	$n!$ (factorial)
10	100	1,024	3,628,800
20	400	$\approx 1.05 \times 10^{6}$	$\approx 2.4 \times 10^{18}$
50	2,500	$\approx 1.1 \times 10^{15}$	$\approx 3 \times 10^{64}$
100	10,000	$\approx 1.3 \times 10^{30}$	$\approx 9.3 \times 10^{157}$

At $n=100$ the polynomial column is a comfortable ten thousand, while the factorial column ( $\approx 10^{157}$ ) exceeds the estimated number of atoms in the observable universe ( $\approx 10^{80}$ ) by seventy-seven orders of magnitude. No amount of faster hardware rescues this: recall from the complexity lesson that a thousand-fold speed-up lets a linear algorithm handle a thousand times more data but lets an exponential algorithm handle only about ten more elements. When a problem's only known algorithms are exponential or factorial and $n$ is more than tiny, brute force is not slow — it is impossible. This impossibility is the entire motivation for heuristics.

What Is a Heuristic?

A heuristic is a problem-solving strategy that deliberately trades the guarantee of optimality for speed: it produces a good-enough solution in a reasonable (polynomial) time, accepting that the answer may not be the absolute best.

Properties of heuristics

Property	Detail
Not guaranteed optimal	The solution is usually close to the best, but there is no proof it is the best
Much faster	Typically runs in polynomial time ( $O(n^2)$ , say) instead of exponential/factorial
Problem-specific	A heuristic encodes domain knowledge; a good TSP heuristic is useless for, say, timetabling
A genuine trade-off	Solution quality is exchanged for computation time, and the balance can often be tuned

A heuristic is, in essence, an educated "rule of thumb". The art lies in choosing a rule that is cheap to apply yet usually lands close to optimal — and in being honest that "usually" and "close" are doing real work, because a heuristic can occasionally produce a poor answer.

Common Heuristic Approaches

1. Greedy algorithms

A greedy algorithm builds a solution by repeatedly making the choice that looks best right now (the locally optimal choice), never reconsidering. It is the simplest heuristic strategy and often surprisingly effective, though it can be led astray because a locally optimal choice need not lead to a globally optimal result.

Nearest-neighbour heuristic for the TSP:

Start at any city.
Travel to the nearest unvisited city.
Repeat until every city has been visited.
Return to the start city.

This runs in $O(n^2)$ — for each of $n$ steps it scans the remaining cities to find the closest — which is dramatically faster than the $O(n!)$ brute force, but it gives no guarantee of the shortest tour.

Worked example — where greedy goes wrong. Consider four cities with these symmetric distances:

graph LR
  A["A"] ---|1| B["B"]
  B ---|1| C["C"]
  C ---|1| D["D"]
  A ---|2| C["C"]
  B ---|2| D["D"]
  A ---|9| D["D"]

Distances: A–B = 1, B–C = 1, C–D = 1, A–C = 2, B–D = 2, A–D = 9.

Greedy (nearest neighbour) from A: the nearest city to A is B (1). From B, the unvisited nearest is C (1). From C, the only unvisited city is D (1). Then return D→A (9). Tour A→B→C→D→A has length $1+1+1+9 = 12$ .

A better tour: A→C→D→B→A has length $2 + 1 + 2 + 1 = 6$ — half the greedy tour's length. The greedy choice of B→C→D marched the route into the corner at D, from which the only way home was the expensive 9-length edge. This concretely demonstrates the central weakness of greedy methods: an irrevocable locally optimal choice (always step to the nearest city) can force a globally poor outcome. The heuristic is fast and often good, but here it is twice the optimal — exactly the kind of failure that motivates the more sophisticated methods below.

2. Simulated annealing

Inspired by the metallurgical process of slowly cooling a metal so its atoms settle into a low-energy crystal:

Start with a random candidate solution.
Make a small random change to it.
If the change improves the objective, accept it.
If the change makes it worse, accept it anyway with a probability that decreases over time as a "temperature" parameter cools.
Repeat, lowering the temperature, until it reaches (near) zero.

The crucial idea is that occasionally accepting a worse solution — especially early, when the temperature is high — lets the search climb out of a local optimum (a solution better than all its near neighbours but not the global best). A purely greedy search gets permanently stuck in the first local optimum it reaches; simulated annealing's controlled randomness gives it a route out, gradually settling as the temperature falls.

3. Genetic algorithms

Inspired by biological evolution and natural selection:

Generate a population of random candidate solutions.
Evaluate each candidate's fitness (how good its objective value is).
Select the fittest candidates to act as parents.
Produce a new generation by crossover (combining parts of two parents) and mutation (small random changes).
Replace the old population with the new generation.
Repeat for many generations.

Over successive generations the population tends to evolve towards better solutions, because fitter candidates are more likely to pass their features on. Genetic algorithms suit very large, rugged search spaces and parallelise naturally (many candidates can be evaluated at once).

Heuristics in A* — a Worked Instance

You have already met a heuristic at work: the function $h(n)$ in A* search estimates the remaining cost to the goal. A* is a useful bridge between this lesson and the optimisation algorithms, because here the heuristic accelerates an exact search rather than approximating an answer — and its quality directly governs performance:

Heuristic quality	Effect on A*
$h(n) = 0$	A* degenerates to Dijkstra's — no guidance, explores in all directions
$h(n)$ = exact remaining cost	A* walks straight to the goal — perfect guidance, no wasted exploration
$h(n)$ slightly underestimates (admissible)	A* is fast and still finds the optimal path
$h(n)$ overestimates (inadmissible)	A* is faster still but may miss the optimal path

The pattern mirrors the general heuristic trade-off: a more aggressive (tighter) estimate speeds the search, but pushing it past the point of admissibility sacrifices the guarantee of an optimal answer — quality traded for speed.

Comparison of Approaches

Approach	Guarantees optimal?	Time	When to use
Brute force	Yes	$O(n!)$ or $O(2^n)$	Only when $n$ is very small
Greedy heuristic	No	$O(n^2)$ or similar	A quick approximation is needed and occasional poor answers are tolerable
Simulated annealing	No (but often near-optimal)	Depends on the cooling schedule	Rugged landscapes with many local optima
Genetic algorithm	No (but often near-optimal)	Depends on population size and generations	Very large search spaces; parallelisable

The single thread running through the table is the quality-versus-time trade-off: the only method guaranteeing optimality (brute force) is the only one that is infeasible at scale, and every practical method buys its speed by relaxing the optimality guarantee.

Synoptic Links

Classification of algorithms (4.4.4): the tractable/intractable boundary, and the use of heuristics for intractable problems, is this section; the polynomial-vs-exponential reasoning comes straight from the complexity lesson.
A* search (4.3.6): A*'s $h(n)$ is a concrete heuristic; the admissibility/optimality trade-off here mirrors the speed/quality trade-off of heuristics in general.
Dijkstra's algorithm (4.3.6): an exact optimisation algorithm — the polynomial-time, optimality-guaranteed counterpoint to the heuristics needed for intractable problems.
Limits of computation (4.4.1–4.4.3): the computable/non-computable distinction (the Halting Problem) sits beside tractable/intractable as a stronger barrier than mere running time.
Big-O notation (4.4.4): the "why brute force fails" table is a direct application of comparing growth rates.

Common Misconceptions

"Intractable means unsolvable." No — intractable problems are still computable (they can be solved, e.g. TSP by brute force); they just have no known polynomial-time algorithm. Unsolvable (non-computable) is the stronger property of the Halting Problem.
"Tractable means fast / easy." It means polynomial-time solvable. An $O(n^{100})$ algorithm is tractable by definition yet useless in practice.
"A heuristic gives the wrong answer." A heuristic gives a valid answer that may not be optimal. It is not wrong — it is unoptimised, and is often very close to the best.
"Greedy always gives a good result." Greedy makes irrevocable locally optimal choices and can be badly misled (see the TSP example where it produced twice the optimal length).
"A faster computer would make TSP tractable." Tractability is a property of the problem and its algorithms, not the hardware. A faster machine shifts the feasible $n$ only slightly for factorial growth — it does not change the classification.
"Heuristics turn an intractable problem into a tractable one." They do not solve the original (find-the-optimum) problem in polynomial time; they solve a different, weaker problem (find a good-enough solution) quickly.

Specimen Question

A courier firm must plan a daily round that visits all of its drop-off points and returns to the depot, minimising total distance.

(a) The firm has 15 drop-off points. Explain why finding the guaranteed shortest round by brute force is impractical. (3 marks)

Optimisation and Heuristics

Optimisation and Heuristics

Spec Mapping

What Is an Optimisation Problem?

The Travelling Salesman Problem (TSP)

Tractable, Intractable, Computable and Non-computable

Why Brute Force Fails for Intractable Problems

What Is a Heuristic?

Properties of heuristics

Common Heuristic Approaches

1. Greedy algorithms

2. Simulated annealing

3. Genetic algorithms

Heuristics in A* — a Worked Instance

Comparison of Approaches

Synoptic Links

Common Misconceptions

Specimen Question

More in Computer Science