You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
This lesson is the synthesis of the whole unit: it draws every structure you have met — array, record, linked list, stack, queue, hash table, tree, binary search tree, and graph — into a single decision framework for picking the right one for a given problem. In the written paper, recall of individual structures earns AO1 marks, but the higher AO2/AO3 marks come from justified selection: matching a structure's complexity profile and properties to the specific operations a scenario demands, weighing alternatives, and committing to a recommendation. That comparative reasoning is the skill this lesson builds.
This lesson synthesises AQA A-Level Computer Science (7517), section 4.2 (Data structures) as a whole, applying the abstract-data-type and complexity ideas of §4.2.1 across §4.2.3–§4.2.8 and connecting to the algorithm-complexity reasoning of §4.3. It is a comparison-and-decision lesson rather than a new-structure lesson: it consolidates the time/space trade-offs, gives a question-driven framework for choosing, and rehearses the exam technique for "recommend and justify a data structure" questions. All wording below is original; AQA's published specification text is not reproduced.
The data structure you choose determines, often decisively:
The same task can be elegant and instant with the right structure and sluggish or unmaintainable with the wrong one. Crucially, there is no universally best structure — each is a bundle of trade-offs, fast at some operations and slow at others. Selection is therefore always relative to the operations the problem performs most. The discipline is to identify the dominant operation(s) in a scenario and choose the structure that makes those cheap, accepting slower performance on operations the scenario rarely uses.
This single table is the backbone of the lesson. Learn the shape of it — which structure is fast at access, at search, at insert, at ordered traversal — and most decisions follow.
| Structure | Access by index/key | Search (by value) | Insert | Delete | Space | Ordered? | Best for |
|---|---|---|---|---|---|---|---|
| Array | O(1) by index | O(n) linear / O(logn) binary if sorted | O(n) (shift) | O(n) (shift) | O(n) | If kept sorted | Random access, known fixed size |
| Dynamic array | O(1) by index | O(n) / O(logn) sorted | O(1) amortised at end | O(n) mid | O(n) | If sorted | Indexed access, unknown size |
| Linked list | O(n) | O(n) | O(1) at head | O(1) once node found | O(n) + pointers | No | Frequent insert/delete, no index needed |
| Stack | O(1) top only | O(n) | O(1) push | O(1) pop | O(n) | No (LIFO) | LIFO: undo, recursion, backtracking |
| Queue | O(1) front only | O(n) | O(1) enqueue | O(1) dequeue | O(n) | No (FIFO) | FIFO: scheduling, buffering |
| Priority queue (heap) | O(1) peek min/max | O(n) | O(logn) | O(logn) extract | O(n) | By priority | Always-take-most-important |
| Hash table | O(1) avg by key | O(1) avg | O(1) avg | O(1) avg | O(n) | No | Fast key look-up, dictionaries, sets |
| BST (balanced) | O(logn) by key | O(logn) | O(logn) | O(logn) | O(n) | Yes (in-order) | Sorted data + dynamic updates + range queries |
| Graph | O(1) edge (matrix) | O(V+E) BFS/DFS | O(1) edge | O(1) edge | O(V2) or O(V+E) | N/A | Modelling relationships/networks |
Two columns deserve special attention because they decide so many questions: the search/access column (is it O(1), O(logn), or O(n)?) and the ordered? column (can it produce sorted output and range queries, or not?). The classic hash-table-versus-BST decision lives entirely in these two columns.
Rather than memorising scenarios, ask the problem a short series of questions; each answer narrows the field.
| Ask… | If yes, lean toward… | Because… |
|---|---|---|
| Is the maximum size known and fixed? | Array (else dynamic array / linked structure) | static allocation is cheapest when size is known |
| Do I look things up by index/position? | Array / dynamic array | O(1) indexed access |
| Do I look things up by key, order irrelevant? | Hash table | O(1) average key look-up |
| Do I need data sorted, or range / min / max / successor queries? | BST (or sorted array if static) | only ordered structures support these |
| Is the access pattern last-in-first-out? | Stack | O(1) push/pop at one end |
| Is the access pattern first-in-first-out? | Queue | O(1) enqueue/dequeue at opposite ends |
| Must I always take the highest-priority item? | Priority queue (heap) | O(logn) extract-min/max |
| Are insertions/deletions frequent and indexing rare? | Linked list / BST | no element shifting |
| Am I modelling arbitrary relationships / a network? | Graph | vertices + edges, possibly with cycles |
| Am I modelling a hierarchy (one parent each)? | Tree | parent–child with no cycles |
Often two questions point at different structures — e.g. "look up by key and list in order". That tension is deliberate: it is exactly where the AO3 marks live, because you must weigh the competing requirements and justify which dominates.
Each scenario below identifies the dominant operation, names the best structure, justifies it by complexity, and names the runner-up and why it loses — the full shape of a top-band answer.
Requirement: store ~200 student records; find a student by ID very frequently. Best: a hash table keyed on ID. Why: look-up by key is the dominant operation, and a hash table gives average O(1); with only 200 records the load factor is easily kept low. Runner-up: a sorted array with binary search gives O(logn) search but O(n) insertion (shifting), so it loses whenever records are added. Choose the hash table — unless the system also needed students listed in ID order, which would favour a BST.
Requirement: undo the most recent change. Best: a stack. Why: "most recent first" is LIFO exactly; push each change, pop to undo, both O(1). Runner-up: a queue gives the wrong order (oldest first); an array would work but a stack expresses the intent precisely.
Requirement: process jobs in submission order. Best: a queue (a circular queue to reuse array space). Why: "first submitted, first printed" is FIFO; enqueue/dequeue are O(1). Runner-up: a plain (linear) array queue works but wastes space as the front advances — the circular variant fixes that.
Requirement: treat patients by severity, not arrival order. Best: a priority queue (heap). Why: the most-urgent patient must be served next regardless of arrival; a heap gives O(logn) insert and O(logn) extract-most-urgent. Runner-up: a sorted list keeps order but costs O(n) to insert each new patient — worse than the heap's O(logn).
Requirement: find all words starting with a given prefix. Best: a tree — specifically a trie (prefix tree). Why: a trie stores words character by character, so following the prefix's characters lands directly on the subtree of all completions. Runner-up: a BST gives sorted look-up but prefix matching is awkward; a hash table cannot do prefix queries at all (it destroys order).
Requirement: model users and friendships; test whether two users are connected. Best: an undirected, unweighted graph stored as an adjacency list. Why: friendships are mutual edges between user-vertices; the network is sparse, so the list's O(V+E) space beats a matrix's O(V2), and BFS answers connectivity (and shortest "degrees of separation"). Runner-up: an adjacency matrix gives O(1) edge tests but wastes enormous memory on a sparse social graph.
Requirement: insert/update scores frequently and list players in rank order, and answer "who is between rank X and Y?". Best: a balanced BST keyed on score. Why: it is the only structure giving O(logn) updates and sorted/range traversal together. Runner-up: a hash table updates in O(1) but cannot rank or range; a sorted array ranks instantly but inserts in O(n) — the BST is the balanced compromise the dual requirement demands.
Requirement: revisit pages in reverse order via the back button, while also keeping a chronological list. Best: a stack for the back-button behaviour (LIFO), often paired with a list/array for the full history. Why: "go back to the previous page" is pop-the-most-recent, O(1) on a stack. Runner-up: a queue would replay pages oldest-first — the wrong order entirely. This scenario also illustrates that a real system often combines structures: a stack for one operation and a list for another.
Many "choose a structure" questions reduce to how do you find an item? It helps to see the look-up options as a ladder from slowest to fastest, with the precondition each one demands:
| Look-up method | Complexity | Precondition / cost |
|---|---|---|
| Linear search of unsorted array or linked list | O(n) | none — but slow at scale |
| Binary search of a sorted array | O(logn) | data must be kept sorted (insertion O(n)) |
| Search of a balanced BST | O(logn) | self-balancing needed for the guarantee; gains ordered traversal |
| Hash table look-up by key | O(1) average | good hash + low load factor; loses ordering |
Reading the ladder top-down answers most questions directly: if you only ever find items by key and never need them ordered, climb to the hash table for O(1); if you also need sorted output or ranges, stop at the balanced BST for O(logn) with order; if the data is static and rarely changes, a sorted array with binary search is simplest. The decisive trade is almost always the O(1) but unordered hash table versus the O(logn) but ordered BST — and the tie-breaker is whether the scenario ever needs order. Quoting this ladder, and the precondition each rung demands, is a fast route to full justification marks.
Suppose a scenario says: "store up to 10 million IP addresses seen by a firewall; for each incoming packet, check in real time whether its source IP has been seen before; periodically also export the list of seen IPs in ascending order for an audit." Walk the framework: the dominant, real-time operation is membership by key ("seen before?"), pointing at a hash set for O(1). But "export in ascending order" needs order, which a hash set cannot give cheaply (O(nlogn) to sort on export). The audit is periodic, not per-packet, so the rare O(nlogn) export is acceptable, whereas slowing the per-packet check would not be. Conclusion: a hash set for the per-packet membership test, accepting an occasional sort at export time — unless the audit became frequent, when a balanced BST (ordered, O(logn) membership) would be the better single structure. This is the exact shape of reasoning the exam rewards: name the dominant operation, choose for it, and state the condition under which the choice would change.
| Tension | Option A | Option B | Decide by |
|---|---|---|---|
| Fast indexed access vs flexible size | Array (O(1) index) | Linked list (grows freely) | Do you index, or insert/delete a lot? |
| Fast key look-up vs ordered output | Hash table (O(1), unordered) | BST (O(logn), ordered) | Do you ever need sorted/range queries? |
| Memory frugality vs access speed | Linked list (no wasted slots, but pointers) | Array (contiguous, but may over-allocate) | Is memory or cache speed the constraint? |
| Simple vs fast | Linear search of an array | Hash table / binary search | Is n small (simple wins) or large (fast wins)? |
| Worst-case guarantee vs average speed | Balanced BST (guaranteed O(logn)) | Hash table (average O(1), worst O(n)) | Can you tolerate occasional worst-case slow-down? |
The recurring meta-pattern: constant-time but unordered (hash table) versus logarithmic but ordered (BST); fast-indexed but rigid (array) versus flexible but sequential (linked list). Naming which side of these tensions a scenario sits on is the core of the analysis.
| Mistake | Correction |
|---|---|
| Choosing an array when the size is unknown | Use a dynamic array or linked structure that grows at runtime. |
| Choosing a linked list when fast indexed access is needed | Use an array — a linked list needs O(n) traversal to reach position k. |
| Using a hash table when sorted order or range queries are needed | Use a BST — hash tables deliberately scatter keys and keep no order. |
| Quoting hash-table O(1) as if guaranteed | State it is average; the worst case is O(n) under heavy collisions. |
| Naming a structure but not justifying it | Always link the structure's complexity/properties to the scenario's operations. |
| Confusing an ADT with its implementation | A stack is an ADT; an array-based stack is one implementation of it. |
| Ignoring the runner-up | Top answers say why the alternative is worse, not just why the choice is good. |
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.