Choosing the Right Data Structure

This lesson is the synthesis of the whole unit: it draws every structure you have met — array, record, linked list, stack, queue, hash table, tree, binary search tree, and graph — into a single decision framework for picking the right one for a given problem. In the written paper, recall of individual structures earns AO1 marks, but the higher AO2/AO3 marks come from justified selection: matching a structure's complexity profile and properties to the specific operations a scenario demands, weighing alternatives, and committing to a recommendation. That comparative reasoning is the skill this lesson builds.

Spec Mapping

This lesson synthesises AQA A-Level Computer Science (7517), section 4.2 (Data structures) as a whole, applying the abstract-data-type and complexity ideas of §4.2.1 across §4.2.3–§4.2.8 and connecting to the algorithm-complexity reasoning of §4.3. It is a comparison-and-decision lesson rather than a new-structure lesson: it consolidates the time/space trade-offs, gives a question-driven framework for choosing, and rehearses the exam technique for "recommend and justify a data structure" questions. All wording below is original; AQA's published specification text is not reproduced.

Why the Choice Matters

The data structure you choose determines, often decisively:

Time complexity — how the cost of each operation grows with data size $n$ (the dominant concern at scale).
Space complexity — how much memory is consumed, including overhead such as pointers.
Code complexity — how hard the solution is to write correctly and maintain.
Scalability — whether a solution that works for hundreds still works for millions.

The same task can be elegant and instant with the right structure and sluggish or unmaintainable with the wrong one. Crucially, there is no universally best structure — each is a bundle of trade-offs, fast at some operations and slow at others. Selection is therefore always relative to the operations the problem performs most. The discipline is to identify the dominant operation(s) in a scenario and choose the structure that makes those cheap, accepting slower performance on operations the scenario rarely uses.

The Master Comparison Table

This single table is the backbone of the lesson. Learn the shape of it — which structure is fast at access, at search, at insert, at ordered traversal — and most decisions follow.

Structure	Access by index/key	Search (by value)	Insert	Delete	Space	Ordered?	Best for
Array	$O(1)$ by index	$O(n)$ linear / $O(\log n)$ binary if sorted	$O(n)$ (shift)	$O(n)$ (shift)	$O(n)$	If kept sorted	Random access, known fixed size
Dynamic array	$O(1)$ by index	$O(n)$ / $O(\log n)$ sorted	$O(1)$ amortised at end	$O(n)$ mid	$O(n)$	If sorted	Indexed access, unknown size
Linked list	$O(n)$	$O(n)$	$O(1)$ at head	$O(1)$ once node found	$O(n)$ + pointers	No	Frequent insert/delete, no index needed
Stack	$O(1)$ top only	$O(n)$	$O(1)$ push	$O(1)$ pop	$O(n)$	No (LIFO)	LIFO: undo, recursion, backtracking
Queue	$O(1)$ front only	$O(n)$	$O(1)$ enqueue	$O(1)$ dequeue	$O(n)$	No (FIFO)	FIFO: scheduling, buffering
Priority queue (heap)	$O(1)$ peek min/max	$O(n)$	$O(\log n)$	$O(\log n)$ extract	$O(n)$	By priority	Always-take-most-important
Hash table	$O(1)$ avg by key	$O(1)$ avg	$O(1)$ avg	$O(1)$ avg	$O(n)$	No	Fast key look-up, dictionaries, sets
BST (balanced)	$O(\log n)$ by key	$O(\log n)$	$O(\log n)$	$O(\log n)$	$O(n)$	Yes (in-order)	Sorted data + dynamic updates + range queries
Graph	$O(1)$ edge (matrix)	$O(V+E)$ BFS/DFS	$O(1)$ edge	$O(1)$ edge	$O(V^2)$ or $O(V+E)$	N/A	Modelling relationships/networks

Two columns deserve special attention because they decide so many questions: the search/access column (is it $O(1)$ , $O(\log n)$ , or $O(n)$ ?) and the ordered? column (can it produce sorted output and range queries, or not?). The classic hash-table-versus-BST decision lives entirely in these two columns.

A Question-Driven Decision Framework

Rather than memorising scenarios, ask the problem a short series of questions; each answer narrows the field.

Ask…	If yes, lean toward…	Because…
Is the maximum size known and fixed?	Array (else dynamic array / linked structure)	static allocation is cheapest when size is known
Do I look things up by index/position?	Array / dynamic array	$O(1)$ indexed access
Do I look things up by key, order irrelevant?	Hash table	$O(1)$ average key look-up
Do I need data sorted, or range / min / max / successor queries?	BST (or sorted array if static)	only ordered structures support these
Is the access pattern last-in-first-out?	Stack	$O(1)$ push/pop at one end
Is the access pattern first-in-first-out?	Queue	$O(1)$ enqueue/dequeue at opposite ends
Must I always take the highest-priority item?	Priority queue (heap)	$O(\log n)$ extract-min/max
Are insertions/deletions frequent and indexing rare?	Linked list / BST	no element shifting
Am I modelling arbitrary relationships / a network?	Graph	vertices + edges, possibly with cycles
Am I modelling a hierarchy (one parent each)?	Tree	parent–child with no cycles

Often two questions point at different structures — e.g. "look up by key and list in order". That tension is deliberate: it is exactly where the AO3 marks live, because you must weigh the competing requirements and justify which dominates.

Worked Scenario Analyses

Each scenario below identifies the dominant operation, names the best structure, justifies it by complexity, and names the runner-up and why it loses — the full shape of a top-band answer.

Scenario 1 — Student records, look-up by ID

Requirement: store ~200 student records; find a student by ID very frequently. Best: a hash table keyed on ID. Why: look-up by key is the dominant operation, and a hash table gives average $O(1)$ ; with only 200 records the load factor is easily kept low. Runner-up: a sorted array with binary search gives $O(\log n)$ search but $O(n)$ insertion (shifting), so it loses whenever records are added. Choose the hash table — unless the system also needed students listed in ID order, which would favour a BST.

Scenario 2 — Text-editor undo

Requirement: undo the most recent change. Best: a stack. Why: "most recent first" is LIFO exactly; push each change, pop to undo, both $O(1)$ . Runner-up: a queue gives the wrong order (oldest first); an array would work but a stack expresses the intent precisely.

Scenario 3 — Print-job scheduler

Requirement: process jobs in submission order. Best: a queue (a circular queue to reuse array space). Why: "first submitted, first printed" is FIFO; enqueue/dequeue are $O(1)$ . Runner-up: a plain (linear) array queue works but wastes space as the front advances — the circular variant fixes that.

Scenario 4 — Hospital triage

Requirement: treat patients by severity, not arrival order. Best: a priority queue (heap). Why: the most-urgent patient must be served next regardless of arrival; a heap gives $O(\log n)$ insert and $O(\log n)$ extract-most-urgent. Runner-up: a sorted list keeps order but costs $O(n)$ to insert each new patient — worse than the heap's $O(\log n)$ .

Scenario 5 — Autocomplete / prefix search

Requirement: find all words starting with a given prefix. Best: a tree — specifically a trie (prefix tree). Why: a trie stores words character by character, so following the prefix's characters lands directly on the subtree of all completions. Runner-up: a BST gives sorted look-up but prefix matching is awkward; a hash table cannot do prefix queries at all (it destroys order).

Scenario 6 — Social network connectivity

Requirement: model users and friendships; test whether two users are connected. Best: an undirected, unweighted graph stored as an adjacency list. Why: friendships are mutual edges between user-vertices; the network is sparse, so the list's $O(V+E)$ space beats a matrix's $O(V^2)$ , and BFS answers connectivity (and shortest "degrees of separation"). Runner-up: an adjacency matrix gives $O(1)$ edge tests but wastes enormous memory on a sparse social graph.

Scenario 7 — Leaderboard with live updates and ranked listing

Requirement: insert/update scores frequently and list players in rank order, and answer "who is between rank X and Y?". Best: a balanced BST keyed on score. Why: it is the only structure giving $O(\log n)$ updates and sorted/range traversal together. Runner-up: a hash table updates in $O(1)$ but cannot rank or range; a sorted array ranks instantly but inserts in $O(n)$ — the BST is the balanced compromise the dual requirement demands.

Scenario 8 — Browser history with back-button

Requirement: revisit pages in reverse order via the back button, while also keeping a chronological list. Best: a stack for the back-button behaviour (LIFO), often paired with a list/array for the full history. Why: "go back to the previous page" is pop-the-most-recent, $O(1)$ on a stack. Runner-up: a queue would replay pages oldest-first — the wrong order entirely. This scenario also illustrates that a real system often combines structures: a stack for one operation and a list for another.

A Complexity Ladder for Search/Look-up

Many "choose a structure" questions reduce to how do you find an item? It helps to see the look-up options as a ladder from slowest to fastest, with the precondition each one demands:

Look-up method	Complexity	Precondition / cost
Linear search of unsorted array or linked list	$O(n)$	none — but slow at scale
Binary search of a sorted array	$O(\log n)$	data must be kept sorted (insertion $O(n)$ )
Search of a balanced BST	$O(\log n)$	self-balancing needed for the guarantee; gains ordered traversal
Hash table look-up by key	$O(1)$ average	good hash + low load factor; loses ordering

Reading the ladder top-down answers most questions directly: if you only ever find items by key and never need them ordered, climb to the hash table for $O(1)$ ; if you also need sorted output or ranges, stop at the balanced BST for $O(\log n)$ with order; if the data is static and rarely changes, a sorted array with binary search is simplest. The decisive trade is almost always the $O(1)$ but unordered hash table versus the $O(\log n)$ but ordered BST — and the tie-breaker is whether the scenario ever needs order. Quoting this ladder, and the precondition each rung demands, is a fast route to full justification marks.

Worked Decision Walk-through

Suppose a scenario says: "store up to 10 million IP addresses seen by a firewall; for each incoming packet, check in real time whether its source IP has been seen before; periodically also export the list of seen IPs in ascending order for an audit." Walk the framework: the dominant, real-time operation is membership by key ("seen before?"), pointing at a hash set for $O(1)$ . But "export in ascending order" needs order, which a hash set cannot give cheaply ( $O(n \log n)$ to sort on export). The audit is periodic, not per-packet, so the rare $O(n \log n)$ export is acceptable, whereas slowing the per-packet check would not be. Conclusion: a hash set for the per-packet membership test, accepting an occasional sort at export time — unless the audit became frequent, when a balanced BST (ordered, $O(\log n)$ membership) would be the better single structure. This is the exact shape of reasoning the exam rewards: name the dominant operation, choose for it, and state the condition under which the choice would change.

The Trade-offs That Decide Most Questions

Tension	Option A	Option B	Decide by
Fast indexed access vs flexible size	Array ( $O(1)$ index)	Linked list (grows freely)	Do you index, or insert/delete a lot?
Fast key look-up vs ordered output	Hash table ( $O(1)$ , unordered)	BST ( $O(\log n)$ , ordered)	Do you ever need sorted/range queries?
Memory frugality vs access speed	Linked list (no wasted slots, but pointers)	Array (contiguous, but may over-allocate)	Is memory or cache speed the constraint?
Simple vs fast	Linear search of an array	Hash table / binary search	Is $n$ small (simple wins) or large (fast wins)?
Worst-case guarantee vs average speed	Balanced BST (guaranteed $O(\log n)$ )	Hash table (average $O(1)$ , worst $O(n)$ )	Can you tolerate occasional worst-case slow-down?

The recurring meta-pattern: constant-time but unordered (hash table) versus logarithmic but ordered (BST); fast-indexed but rigid (array) versus flexible but sequential (linked list). Naming which side of these tensions a scenario sits on is the core of the analysis.

Common Exam Mistakes

Mistake	Correction
Choosing an array when the size is unknown	Use a dynamic array or linked structure that grows at runtime.
Choosing a linked list when fast indexed access is needed	Use an array — a linked list needs $O(n)$ traversal to reach position $k$ .
Using a hash table when sorted order or range queries are needed	Use a BST — hash tables deliberately scatter keys and keep no order.
Quoting hash-table $O(1)$ as if guaranteed	State it is average; the worst case is $O(n)$ under heavy collisions.
Naming a structure but not justifying it	Always link the structure's complexity/properties to the scenario's operations.
Confusing an ADT with its implementation	A stack is an ADT; an array-based stack is one implementation of it.
Ignoring the runner-up	Top answers say why the alternative is worse, not just why the choice is good.

Choosing the Right Data Structure

Choosing the Right Data Structure

Spec Mapping

Why the Choice Matters

The Master Comparison Table

A Question-Driven Decision Framework

Worked Scenario Analyses

Scenario 1 — Student records, look-up by ID

Scenario 2 — Text-editor undo

Scenario 3 — Print-job scheduler

Scenario 4 — Hospital triage

Scenario 5 — Autocomplete / prefix search

Scenario 6 — Social network connectivity

Scenario 7 — Leaderboard with live updates and ranked listing

Scenario 8 — Browser history with back-button

A Complexity Ladder for Search/Look-up

Worked Decision Walk-through

The Trade-offs That Decide Most Questions

Common Exam Mistakes

More in Computer Science