You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
How do you say, precisely, that one algorithm is "better" than another — without timing it on a particular laptop, with a particular compiler, on a particular day? You measure how its cost grows as the input gets bigger, and you express that growth with Big-O notation. Big-O is the universal language of algorithm efficiency: it strips away hardware, constants and clutter to capture the one thing that ultimately decides whether an algorithm is usable at scale — its growth rate. Every other lesson in this unit has gestured at it ("linear search is O(n), binary search is O(logn)"); this is the lesson where it is made rigorous.
This is the complexity reference lesson for the whole course. You will learn what Big-O really means, the rules for deriving it from code, the standard complexity classes from O(1) to O(n!) with a growth-rate comparison, the distinction between time and space complexity, and the difference between best, average and worst case. Treat the tables here as the master reference you return to whenever you analyse any data structure, search or sort.
Why does this matter beyond the exam? Because the difference between complexity classes is not a matter of a few percent — it is the difference between software that works and software that does not. A web service that sorts user records with an O(n2) algorithm will be perfectly snappy in testing with 50 records and will fall over completely in production with 50,000; the same service using an O(nlogn) sort will not notice the difference. Professional engineers reach for Big-O reasoning before writing code, precisely to avoid building something that cannot scale. The notation is, in that sense, one of the most practical ideas in the entire specification.
This lesson supports the OCR H446 section 2.3 content on algorithm complexity and Big-O notation, with connections to every algorithm and data structure in the course. Paraphrased, the assessable ideas are:
This is a paraphrase; check the exact wording against the current OCR H446 specification.
Two algorithms can solve the same problem yet differ enormously in cost. Big-O lets us compare them objectively, predict how each scales, and choose the right one. Informally:
Big-O notation describes an upper bound on the growth rate of an algorithm's resource usage (time or space) as the input size n→∞.
When we write T(n)=O(f(n)) we mean that, for large enough n, the algorithm's cost T(n) is at most a constant multiple of f(n). Formally:
T(n)=O(f(n))⟺∃c>0,n0>0such thatT(n)≤c⋅f(n)for alln≥n0
You do not need to manipulate this definition in the exam, but understanding it explains the two simplification rules that do matter, because both fall straight out of it:
The reason we are allowed to be this cavalier is that Big-O is a statement about asymptotic behaviour — what happens as n becomes large — and at scale the dominant term and growth class are all that survive. A faster constant factor can make one O(n2) program beat another, but no constant factor lets an O(n2) algorithm keep pace with an O(nlogn) one once n is large enough. That is the whole point of the notation.
It helps to be clear about why we deliberately throw information away. We could, in principle, measure an algorithm by counting its exact number of operations — say 3n2+7n+12 — but that figure is fragile and unhelpful: the "3" depends on how many machine instructions one line compiles to, the "7" and "12" vanish into insignificance for large n, and none of it transfers to a different language or processor. What does transfer, and what actually determines whether the algorithm copes with a million inputs, is the shape of the growth — quadratic. Big-O is the engineering decision to keep that durable, decision-relevant information (n2) and discard the fragile, hardware-specific clutter (3, 7n, 12). Far from being imprecise, it is precision about the right thing.
A common confusion is that "O" means "exactly" — it does not. O(f(n)) is an upper bound: an algorithm that is O(n) is also, technically, O(n2) and O(2n), because those are all valid upper bounds. In practice, and in exams, we always quote the tightest upper bound we can — saying linear search is O(n), not the technically-true-but-useless O(n2). When you want to say a bound is tight (both upper and lower), the correct notation is Big-Theta, Θ(f(n)), mentioned in Going Further.
| Big-O | Name | Representative algorithm | Verdict |
|---|---|---|---|
| O(1) | Constant | Array access by index; hash-table lookup (average) | Excellent |
| O(logn) | Logarithmic | Binary search; balanced-BST operations | Very good |
| O(n) | Linear | Linear search; one pass over a list | Good |
| O(nlogn) | Linearithmic | Merge sort; quicksort (average) | The practical limit for comparison sorts |
| O(n2) | Quadratic | Bubble sort; insertion sort (worst) | Poor for large n |
| O(2n) | Exponential | Naive recursive Fibonacci; brute-force subsets | Infeasible beyond small n |
| O(n!) | Factorial | Brute-force travelling salesman | Hopeless beyond tiny n |
These are listed in increasing order of growth, so this is also the order of preference: an algorithm higher in the table is, asymptotically, always better than one lower down. Memorising this ordering — and one representative algorithm per row — is the single most useful piece of factual recall in the whole unit.
It is worth registering just how much worse the bottom two rows are than everything above them. The top five classes — constant through quadratic — are all polynomial (or sub-polynomial), and although they differ in steepness they all scale in a manageable way: doubling n at most quadruples the cost of the worst of them. The bottom two — O(2n) and O(n!) — are super-polynomial, and they behave qualitatively differently: O(2n) doubles with each extra input, and O(n!) is worse still, multiplying by n each time (since n!=n×(n−1)!). To feel the gap, at n=20 we have 220≈106 but 20!≈2.4×1018 — a factor of more than a trillion between the two "infeasible" classes themselves. This polynomial-versus-super-polynomial divide is the precise boundary the tractability lesson calls tractable versus intractable, so this table is, in effect, your first map of that territory.
The classes do not just differ a little — they diverge violently. The chart below sketches the shape of each curve as n increases (cost on the vertical axis):
flowchart LR
subgraph Growth["Cost as n increases (steepness = how fast cost explodes)"]
direction TB
A["O(1) — flat: cost never changes"]
B["O(log n) — almost flat: barely rises"]
C["O(n) — straight diagonal"]
D["O(n log n) — slightly steeper than linear"]
E["O(n^2) — curves sharply upward"]
F["O(2^n) / O(n!) — near-vertical wall almost immediately"]
end
A --> B --> C --> D --> E --> F
Putting numbers on it makes the divergence vivid. The table shows roughly how many basic operations each class implies:
| n | O(1) | O(logn) | O(n) | O(nlogn) | O(n2) | O(2n) |
|---|---|---|---|---|---|---|
| 10 | 1 | ~3 | 10 | ~33 | 100 | 1,024 |
| 100 | 1 | ~7 | 100 | ~664 | 10,000 | ≈1.3×1030 |
| 1,000 | 1 | ~10 | 1,000 | ~9,966 | 1,000,000 | astronomically large |
| 1,000,000 | 1 | ~20 | 1,000,000 | ~20,000,000 | 1012 | beyond comprehension |
Read across the n=1,000,000 row: a logarithmic algorithm finishes in about 20 steps, a quadratic one needs a trillion, and an exponential one could not finish before the heat death of the universe. This single row is why algorithm choice dominates raw hardware speed: no processor upgrade rescues O(n2) on large data, whereas switching to O(nlogn) turns "runs overnight" into "runs in a blink".
To dramatise the exponential row specifically: suppose a computer performs a billion (109) operations per second. An O(2n) algorithm at n=60 needs about 260≈1.15×1018 operations, which at a billion per second takes roughly 36 years. Add just ten more inputs and 270 takes over 37,000 years — every extra input doubles the time. This is the qualitative cliff that separates the polynomial classes (which scale gracefully) from the exponential ones (which hit a wall almost immediately), and it is the entire motivation for the tractability lesson that follows. Polynomial growth, however steep, is a hill; exponential growth is a cliff.
Big-O is an asymptotic statement about large n, and it is honest to acknowledge what it hides. For small inputs the discarded constant factors can dominate, so a "worse" complexity class can be faster in practice. Insertion sort (O(n2)) routinely beats merge sort (O(nlogn)) for arrays of, say, fewer than 20 elements, because its constant factors and lack of recursion overhead win when n is tiny — which is exactly why high-performance sort libraries switch to insertion sort for small sub-arrays. The lesson is not that Big-O is wrong, but that it answers one specific question ("how does this scale?") and you must remember it is silent on others ("which is faster for n=12?"). A complete answer states the complexity and, where relevant, notes the small-n caveat.
The exam will give you code and ask for its complexity. A reliable method: identify the dominant operation, count how many times it runs as a function of n, then simplify.
| Code shape | Complexity | Why |
|---|---|---|
| A fixed sequence of statements | O(1) | Work independent of n |
One loop for i = 0 to n-1 | O(n) | Body runs n times |
| Two nested loops, each to n | O(n2) | Inner body runs n×n times |
| Loop that halves the range each step | O(logn) | Range n→n/2→n/4→⋯→1 takes log2n steps |
| A halving loop inside an n-loop | O(nlogn) | n iterations, each doing logn work |
The combination rules mirror the structure of the code: sequence adds complexities (then keep the dominant), while nesting multiplies them.
# OCR-style pseudocode
function example(n)
total = 0 // O(1)
for i = 0 to n - 1 // outer: n iterations
for j = 0 to n - 1 // inner: n iterations each
total = total + 1 // O(1) body
next j
next i
return total // O(1)
endfunction
The body total = total + 1 runs n×n=n2 times. Adding the surrounding O(1) work: O(1)+n2⋅O(1)+O(1)=O(n2). The nested loops multiply; the constant-time statements are dropped as lower-order.
# OCR-style pseudocode
function halver(n)
i = n
count = 0
while i > 1 // how many times can we halve n?
i = i DIV 2 // i: n -> n/2 -> n/4 -> ... -> 1
count = count + 1
endwhile
return count
endfunction
Each iteration halves i, so the loop runs about log2n times — hence O(logn). This is exactly why binary search is logarithmic: halving the search space each step.
Students often accept "O(logn)" without really seeing why a logarithm. The link is simple: a logarithm answers the question "how many times can I halve n before I reach 1?" If you halve n repeatedly — n,2n,4n,…,1 — the number of halvings is log2n, because 2(number of halvings)=n. So any algorithm that repeatedly discards a constant fraction of its remaining work is logarithmic: binary search (discard half the array), balanced-tree descent (drop to one child), exponentiation by squaring (halve the exponent). Conversely, a logarithm grows agonisingly slowly — log2 of a billion is only about 30 — which is why logarithmic algorithms feel almost free.
A point that puzzles many candidates: Big-O writes O(logn) without specifying a base. That is deliberate and correct, because changing the base of a logarithm only multiplies it by a constant: log2n=log102log10n, and log1021 is just a constant factor that Big-O discards. So O(log2n), O(log10n) and O(lnn) are the same complexity class — we simply write O(logn). (In computing the base is almost always 2, because we keep halving things, but it makes no difference to the Big-O.)
# Two separate loops (sequence) — NOT nested
for i = 0 to n - 1: process(i) // O(n)
for j = 0 to n - 1: process(j) // O(n)
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.