You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Searching is the second most common thing computers do after arithmetic: every database query, every "find" in a document, every lookup in a contacts app is a search. This lesson covers the two searching algorithms that the OCR H446 specification names — linear search and binary search — and treats them the way the exam does: trace them step by step, state their complexity precisely, identify their preconditions, and justify which to use for a given scenario. The contrast between them is the cleanest possible illustration of the central theme from the previous lesson: two algorithms can solve the same problem yet differ enormously in efficiency, and the "better" choice depends on what you can assume about the data. Master this pair thoroughly — almost every later efficiency comparison in the course echoes the linear-versus-binary story, and the tracing skills you sharpen here are exactly those the exam tests across every algorithm.
This lesson supports the searching algorithms content of OCR H446 section 2.3 (algorithm design and analysis), drawing on data-structure ideas from 1.4. Paraphrased, the assessable points are:
As always we paraphrase; check the precise wording against the current OCR H446 specification.
Searching is the process of finding a specific item — the target or key — within a collection of data, and reporting either where it is (its index) or whether it is present at all. It is one of the most frequently executed operations in real software: a search engine answering a query, a spell-checker testing whether a word is in its dictionary, an operating system locating a file, a game checking whether a player already holds an item — all are searches. Because searches happen so often and on such large data, choosing an efficient search algorithm can be the difference between a responsive system and an unusable one.
The choice of search algorithm depends overwhelmingly on one question: is the data sorted?
| Algorithm | Strategy | Precondition | Typical complexity |
|---|---|---|---|
| Linear search | Check each item in turn | None — works on any data | O(n) |
| Binary search | Repeatedly halve the search space | Data must be sorted | O(logn) |
We will also refer to a third option — hashing — which can search in average O(1) time but needs no ordering at all; it is covered fully in the data-structures course and noted here only for context.
The data being searched lives in a data structure — usually an array. Arrays, and the random-access property (the ability to jump straight to element i in one step) that makes binary search possible, are taught in the parallel data-structures course; here we concentrate on the search algorithms themselves and assume the array is given. Whenever you reason about a search you should ask two questions about its data structure: is it ordered? and does it support random access? The answers determine which algorithm is even available to you.
Linear search (also called sequential search) checks each element one at a time, from the first to the last, until the target is found or the end of the collection is reached. It is the search you perform when flicking through an unsorted pile of papers looking for a name.
# OCR-style pseudocode
function linearSearch(data, target, size)
for i = 0 to size - 1
if data[i] == target then
return i
endif
next i
return -1 // sentinel: not found
endfunction
def linear_search(data, target):
for i in range(len(data)):
if data[i] == target:
return i
return -1
numbers = [7, 2, 9, 4, 6, 1, 8, 3, 5]
print(linear_search(numbers, 6)) # 4 (index of 6)
print(linear_search(numbers, 99)) # -1 (absent)
Searching for 6 in [7, 2, 9, 4, 6, 1, 8, 3, 5]:
| Step (i) | data[i] | data[i] == 6? | Action |
|---|---|---|---|
| 0 | 7 | No | Continue |
| 1 | 2 | No | Continue |
| 2 | 9 | No | Continue |
| 3 | 4 | No | Continue |
| 4 | 6 | Yes | Return 4 |
Found at index 4 after 5 comparisons. Had we searched for 99, the loop would run all nine iterations and then return −1 — the worst case.
| Case | Comparisons | Complexity |
|---|---|---|
| Best | 1 (target is the first element) | O(1) |
| Average | about n/2 | O(n) |
| Worst | n (target is last, or absent) | O(n) |
Because constant factors are dropped in Big-O, n/2 and n both reduce to O(n): the growth is linear. Linear search does up to n comparisons, so doubling the data doubles the worst-case work.
It is worth seeing where the "n/2" figure comes from, because the exam sometimes asks you to justify it. Suppose the target is present and is equally likely to be at any of the n positions. If it is at index 0 the search makes 1 comparison; at index 1, two comparisons; and so on up to n comparisons if it is at the last position. The average number of comparisons is therefore the mean of 1,2,3,…,n, which is:
n1+2+⋯+n=nn(n+1)/2=2n+1
For large n this is approximately n/2, and since Big-O ignores the constant factor of one-half it is reported as O(n). This little derivation is a good example of the difference between an exact operation count and an asymptotic complexity class.
If the data happens to be sorted, linear search can stop early once it passes the point where the target would be:
# OCR-style pseudocode: linear search with early exit on sorted ascending data
function linearSearchSorted(data, target, size)
for i = 0 to size - 1
if data[i] == target then
return i
elseif data[i] > target then
return -1 // gone past where it would be; stop early
endif
next i
return -1
endfunction
This still has worst-case complexity O(n) (the target could be the largest element), but it improves the average case for absent values. It also makes an important teaching point: a small change to an algorithm can improve one case without changing its Big-O class — which is exactly why Big-O alone never tells the whole performance story.
| Strength | Limitation |
|---|---|
| Trivially simple to implement and reason about | Slow on large data — O(n) |
| Works on unsorted data | Examines every element in the worst case |
| Works on any structure, including a linked list (no random access needed) | Poor when the same data is searched repeatedly |
| No preprocessing needed — search immediately | Cannot exploit ordering for a fast halving strategy |
Linear search dominates in three practical situations: the data is unsorted and will be searched only once; the data set is tiny (the simplicity outweighs any speed gain); or the structure does not support random access (a singly linked list), so the "jump to the middle" step that binary search needs is itself O(n).
Binary search exploits sorted data. It compares the target with the middle element: if they match it is done; if the target is smaller it discards the entire right half and searches the left; if larger it discards the left half and searches the right. Each comparison throws away half of what remains.
This is the single most important fact about binary search and the most common source of exam error. If the data is not sorted, binary search may report "not found" even when the target is present, because the "go left / go right" decision relies on order. If the data is unsorted you must either sort it first (an O(nlogn) cost, met in the sorting lessons) or fall back to linear search.
To see why unsorted data breaks it, try searching for 5 in the unsorted array [9, 2, 5, 1, 7]. The first mid is index 2, whose value is 5 — here it happens to find it by luck. But search instead for 7: mid = 2 gives 5, and since 7>5 binary search discards indices 0–2 and looks only in [1, 7] — which by chance still contains 7. Now search for 9: mid = 2 gives 5, 9>5, so it discards the left half including index 0 where 9 actually lives, and returns "not found" — a wrong answer. The algorithm did nothing wrong; the precondition was violated. This is why every binary-search question and every binary-search answer must begin by establishing that the data is sorted.
# OCR-style pseudocode
function binarySearch(data, target, size)
low = 0
high = size - 1
while low <= high
mid = (low + high) DIV 2
if data[mid] == target then
return mid
elseif target < data[mid] then
high = mid - 1 // discard right half
else
low = mid + 1 // discard left half
endif
endwhile
return -1
endfunction
def binary_search(data, target):
low = 0
high = len(data) - 1
while low <= high:
mid = (low + high) // 2
if data[mid] == target:
return mid
elif target < data[mid]:
high = mid - 1
else:
low = mid + 1
return -1
numbers = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
print(binary_search(numbers, 13)) # 6
Searching for 13 in the sorted array [1, 3, 5, 7, 9, 11, 13, 15, 17, 19] (indices 0–9):
| Step | low | high | mid | data[mid] | Comparison | New range |
|---|---|---|---|---|---|---|
| 1 | 0 | 9 | 4 | 9 | 13 > 9 | low = 5 |
| 2 | 5 | 9 | 7 | 15 | 13 < 15 | high = 6 |
| 3 | 5 | 6 | 5 | 11 | 13 > 11 | low = 6 |
| 4 | 6 | 6 | 6 | 13 | 13 == 13 | Found at 6 |
Found at index 6 after 4 comparisons. Linear search would have needed 7. Note how mid = (low + high) DIV 2 uses integer division, and how the range shrinks toward a single element.
Searching for 8 in the same array shows how binary search terminates on failure:
| Step | low | high | mid | data[mid] | Comparison | New range |
|---|---|---|---|---|---|---|
| 1 | 0 | 9 | 4 | 9 | 8 < 9 | high = 3 |
| 2 | 0 | 3 | 1 | 3 | 8 > 3 | low = 2 |
| 3 | 2 | 3 | 2 | 5 | 8 > 5 | low = 3 |
| 4 | 3 | 3 | 3 | 7 | 8 > 7 | low = 4 |
Now low (4) > high (3), so the while condition fails and the function returns −1. The loop must terminate because each iteration strictly narrows high − low.
| Case | Comparisons | Complexity |
|---|---|---|
| Best | 1 (target is the middle element) | O(1) |
| Average | about log2n | O(logn) |
| Worst | ⌊log2n⌋+1 | O(logn) |
Each comparison halves the search space. Starting from n=1024 elements, the remaining count after each comparison is:
1024→512→256→128→64→32→16→8→4→2→1
That is 10 halvings, and log21024=10. In general the maximum number of comparisons is ⌊log2n⌋+1. Doubling the data adds just one extra comparison — which is why binary search scales so gracefully: a million sorted items need at most about 20 comparisons.
The table below makes the contrast with linear search vivid. As the data grows by factors of a thousand, linear search's worst case grows by the same factor, while binary search's barely moves:
| Data size n | Linear worst case | Binary worst case |
|---|---|---|
| 1,000 | 1,000 | 10 |
| 1,000,000 | 1,000,000 | 20 |
| 1,000,000,000 | 1,000,000,000 | 30 |
The decision structure of a single binary-search step can be drawn as a small flow of choices. Each diamond discards half the data, which is the visual root of the logarithm:
flowchart TD
A[Examine data[mid]] --> B{target == data[mid]?}
B -- Yes --> F[Return mid]
B -- No --> C{target < data[mid]?}
C -- Yes --> L[Search LEFT half: high = mid - 1]
C -- No --> R[Search RIGHT half: low = mid + 1]
L --> A
R --> A
A neat way to remember the relationship: linear search asks "how many times can I subtract 1 from n before reaching 0?" — the answer is n. Binary search asks "how many times can I halve n before reaching 1?" — the answer is log2n. Repeated subtraction gives linear growth; repeated halving gives logarithmic growth.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.