Merge Sort and Quick Sort

Merge sort and quick sort are efficient comparison-based sorting algorithms with average-case time complexity of $O(n \log n)$ , making them vastly superior to the $O(n^2)$ simple sorts for large datasets. Both rest on the divide-and-conquer paradigm: break the problem into smaller sub-problems, solve those, and combine the results. Merge sort is the second sort named explicitly in the AQA specification (alongside bubble sort), so you must be able to trace it precisely; quick sort is studied for comparison and to illustrate why a faster average case can still hide a dangerous worst case.

Spec Mapping

This lesson addresses AQA A-Level Computer Science (7517), Section 4.3 Fundamentals of algorithms, specifically 4.3.5 Sorting algorithms, where merge sort is the named $O(n \log n)$ sort. It also draws on 4.3.6 (recursion / divide-and-conquer) for the recursive structure, and on 4.4.4 (classification of algorithms) for the complexity analysis, including the distinction between average-case and worst-case orders that separates the two algorithms.

Divide and Conquer

Both algorithms follow the three-stage divide-and-conquer template:

Divide the input into smaller sub-problems.
Conquer each sub-problem, usually by recursion, down to a trivial base case.
Combine the sub-solutions into a solution for the original problem.

The crucial difference is where the work happens. Merge sort does trivial work to divide (split in the middle) and the real work to combine (the merge). Quick sort does the real work to divide (the partition) and trivial work to combine (nothing — the parts are already in order). This single design choice explains almost every difference in their behaviour.

Divide and conquer is powerful because it converts a problem of size $n$ into several problems of size $n/2$ , and the halving is what introduces the logarithmic factor that beats the quadratic simple sorts. Recall from the simple-sorts lesson that $O(n^2)$ arises from comparing essentially every pair of elements. Divide and conquer's trick is to avoid most of those pairwise comparisons: once the list is split, an element in the left half is never directly compared with most elements in the right half — they are only reconciled in bulk during the merge or implied by the partition. By never doing the wasteful all-pairs work, both algorithms break through the $O(n^2)$ ceiling to reach $O(n \log n)$ . The same divide-and-conquer pattern recurs across the course — in binary search, in the recursive structure of many tree algorithms, and as a general problem-solving strategy in 4.3.6 — so it is worth recognising as a reusable idea, not just a sorting trick.

Merge Sort

How It Works

Merge sort uses divide and conquer as follows:

Divide: recursively split the list into halves until each sub-list contains a single element.
Conquer / Combine (Merge): repeatedly merge pairs of sorted sub-lists back together in order.

A single-element list is, by definition, already sorted — this is the base case that stops the recursion. The genius of merge sort is that combining two already-sorted lists is easy: you never have to search, only to compare the two front elements and take the smaller. So the algorithm pushes all the difficulty down to the trivial base case and recovers a sorted whole purely through repeated easy merges. Read bottom-up, it first sorts pairs, then merges pairs into fours, fours into eights, and so on until the entire list is reassembled in order — which is exactly the level-by-level structure that yields the $\log n$ factor.

Pseudocode

FUNCTION mergeSort(arr)
    IF LENGTH(arr) <= 1 THEN
        RETURN arr
    END IF
    mid = LENGTH(arr) DIV 2
    left = mergeSort(arr[0..mid-1])
    right = mergeSort(arr[mid..end])
    RETURN merge(left, right)
END FUNCTION

FUNCTION merge(left, right)
    result = []
    i = 0
    j = 0
    WHILE i < LENGTH(left) AND j < LENGTH(right)
        IF left[i] <= right[j] THEN
            APPEND left[i] TO result
            i = i + 1
        ELSE
            APPEND right[j] TO result
            j = j + 1
        END IF
    END WHILE
    APPEND remaining elements of left to result
    APPEND remaining elements of right to result
    RETURN result
END FUNCTION

The merge step is the heart of the algorithm. Because both inputs are already sorted, it walks two pointers along them, repeatedly taking the smaller front element — a single linear pass that is $O(n)$ in the combined length.

The same algorithm in Python makes the recursion and the two-pointer merge explicit:

def merge_sort(arr):
    if len(arr) <= 1:
        return arr                      # base case: already sorted
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])        # recurse on left half
    right = merge_sort(arr[mid:])       # recurse on right half
    return merge(left, right)

def merge(left, right):
    result, i, j = [], 0, 0
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:         # <= keeps the sort stable
            result.append(left[i]); i += 1
        else:
            result.append(right[j]); j += 1
    result.extend(left[i:])             # append whichever side remains
    result.extend(right[j:])
    return result

The use of <= rather than < in the comparison is what makes merge sort stable: when two elements are equal, the one from the left (earlier) sub-list is taken first, preserving the original relative order. Switching it to < would break stability without affecting correctness — a subtle but examinable detail.

Trace Example — Split Phase

Sorting [38, 27, 43, 3, 9, 82, 10]:

graph TD
  N0["38, 27, 43, 3, 9, 82, 10"] --> N1["38, 27, 43"]
  N0 --> N2["3, 9, 82, 10"]
  N1 --> N3["38"]
  N1 --> N4["27, 43"]
  N2 --> N5["3, 9"]
  N2 --> N6["82, 10"]
  N4 --> N7["27"]
  N4 --> N8["43"]
  N5 --> N9["3"]
  N5 --> N10["9"]
  N6 --> N11["82"]
  N6 --> N12["10"]

Trace Example — Merge Phase

Sub-lists are merged back up the tree, each merge producing a sorted list:

Merge inputs	Result
[27] and [43]	[27, 43]
[38] and [27, 43]	[27, 38, 43]
[3] and [9]	[3, 9]
[82] and [10]	[10, 82]
[3, 9] and [10, 82]	[3, 9, 10, 82]
[27, 38, 43] and [3, 9, 10, 82]	[3, 9, 10, 27, 38, 43, 82]

To make the merge mechanism explicit, here is the final merge of [27, 38, 43] with [3, 9, 10, 82], tracing the two pointers $i$ (left) and $j$ (right):

Compare	Smaller taken	result so far
27 vs 3	3 (right)	[3]
27 vs 9	9 (right)	[3, 9]
27 vs 10	10 (right)	[3, 9, 10]
27 vs 82	27 (left)	[3, 9, 10, 27]
38 vs 82	38 (left)	[3, 9, 10, 27, 38]
43 vs 82	43 (left)	[3, 9, 10, 27, 38, 43]
left exhausted	append 82	[3, 9, 10, 27, 38, 43, 82]

Complexity

Case	Time	Space
Best	$O(n \log n)$	$O(n)$
Worst	$O(n \log n)$	$O(n)$
Average	$O(n \log n)$	$O(n)$

Why $O(n \log n)$ ? The splitting halves the list $\log_2 n$ times, so the recursion tree has $\log_2 n$ levels. At every level, the merges across all sub-lists touch each of the $n$ elements exactly once, costing $O(n)$ per level. Multiplying levels by per-level cost gives $O(n) \times O(\log n) = O(n \log n)$ . Because this reasoning does not depend on the input order, merge sort's best, worst and average cases are all $O(n \log n)$ — it is, in fact, $\Theta(n \log n)$ .

Why $O(n)$ space? The merge step builds new arrays rather than sorting in place; the total extra storage in use at any one time is proportional to $n$ .

Visualising the Two Costs

The complexity is easiest to see by thinking about the recursion as a tree of levels:

graph TD
  L0["Level 0: 1 list of size n — merge cost n"] --> L1["Level 1: 2 lists of size n/2 — total merge cost n"]
  L1 --> L2["Level 2: 4 lists of size n/4 — total merge cost n"]
  L2 --> L3["... continues for log n levels ..."]
  L3 --> L4["Bottom: n lists of size 1 — base case"]

There are two independent facts to combine. Vertically, the list halves at each level, so the tree has $1 + \log_2 n$ levels — this is where the $\log n$ factor comes from. Horizontally, at every single level the merges collectively touch all $n$ elements exactly once, so each level costs $O(n)$ regardless of how many sub-lists it is split across (twice as many lists, each half the size, still sums to $n$ ). Multiplying the per-level cost by the number of levels gives the total: $n \times \log_2 n = O(n \log n)$ . Because this argument never assumed anything about the input order, merge sort's best, worst and average cases are identical — a guarantee no other sort on the specification offers.

Quick Sort

How It Works

Quick sort also uses divide and conquer, but instead of splitting blindly in the middle it chooses a pivot and partitions the list so that:

elements less than or equal to the pivot come before it, and
elements greater than the pivot come after it.

After partitioning, the pivot is in its final sorted position. The algorithm then recursively sorts the sub-lists either side of it. No combine step is needed — once both sides are sorted, the whole list is sorted.

The single most important fact about a partition is that, once complete, the pivot will never move again: everything smaller is to its left and everything larger to its right, so its position is correct no matter how the two sides are later rearranged. Each partition therefore permanently places exactly one element. When the pivots happen to fall near the median each time, the list is roughly halved at every level and the recursion is about $\log_2 n$ deep — the good case. When the pivots are consistently extreme (the smallest or largest element), the split is maximally lopsided and the recursion degenerates to $n$ levels — the bad case traced below. Everything about quick sort's performance flows from how balanced the partitions turn out to be, which is why pivot choice is so consequential.

Pseudocode

FUNCTION quickSort(arr, low, high)
    IF low < high THEN
        pivotIndex = partition(arr, low, high)
        quickSort(arr, low, pivotIndex - 1)
        quickSort(arr, pivotIndex + 1, high)
    END IF
END FUNCTION

FUNCTION partition(arr, low, high)
    pivot = arr[high]
    i = low - 1
    FOR j = low TO high - 1
        IF arr[j] <= pivot THEN
            i = i + 1
            SWAP arr[i], arr[j]
        END IF
    END FOR
    SWAP arr[i + 1], arr[high]
    RETURN i + 1
END FUNCTION

Trace Example — Partition Detail

Sorting [10, 7, 8, 9, 1, 5] with the last element as pivot. The first partition (pivot = 5) is shown in full so the pointer mechanism is visible. i tracks the boundary of the "≤ pivot" region:

j	arr[j]	arr[j] ≤ 5?	Action	Array (boundary i)
0	10	No	—	[10, 7, 8, 9, 1, 5], i = -1
1	7	No	—	[10, 7, 8, 9, 1, 5], i = -1
2	8	No	—	[10, 7, 8, 9, 1, 5], i = -1
3	9	No	—	[10, 7, 8, 9, 1, 5], i = -1
4	1	Yes	i=0; swap arr[0]↔arr[4]	[1, 7, 8, 9, 10, 5], i = 0
end	—	—	swap arr[1]↔arr[5] (pivot)	[1, 5, 8, 9, 10, 7]

The pivot 5 lands at index 1 — its final position. Everything to its left (just [1]) is smaller; everything to its right is larger. Recursion then continues on the two sides:

Sub-list	Pivot	Result of partition
[8, 9, 10, 7] (indices 2–5)	7	7 placed → [7] [8, 9, 10]
[8, 9, 10]	10	10 placed → [8, 9] [10]
[8, 9]	9	9 placed → [8] [9]

Final sorted array: [1, 5, 7, 8, 9, 10].

Complexity

Case	Time	Space
Best	$O(n \log n)$	$O(\log n)$
Worst	$O(n^2)$	$O(n)$
Average	$O(n \log n)$	$O(\log n)$

Why $O(n^2)$ in the worst case? If the pivot is always the smallest or largest element — which happens with already-sorted (or reverse-sorted) data when the last element is chosen as pivot — then one partition is empty and the other holds all the remaining $n-1$ elements. The recursion depth becomes $n$ rather than $\log n$ , and each level still does an $O(n)$ partition, giving $n \times n = O(n^2)$ . Randomising the pivot, or using a median-of-three rule, makes this worst case vanishingly unlikely in practice.

Why $O(\log n)$ space? Quick sort partitions in place (no new arrays), so the only extra memory is the recursion stack, whose depth is $O(\log n)$ on average — though it degrades to $O(n)$ in the unbalanced worst case.

Merge Sort and Quick Sort

Merge Sort and Quick Sort

Spec Mapping

Divide and Conquer

Merge Sort

How It Works

Pseudocode

Trace Example — Split Phase

Trace Example — Merge Phase

Complexity

Visualising the Two Costs

Quick Sort

How It Works

Pseudocode

Trace Example — Partition Detail

Complexity

Seeing the Worst Case

More in Computer Science