You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Arrays and records are the two foundational building blocks of every other structure in this course. An array stores many items of the same type so we can index them; a record groups a few items of different types so we can treat them as one logical thing. Almost everything else — stacks, queues, hash tables, even the nodes of a tree — is ultimately built on top of these two ideas.
This lesson addresses the H446 1.4.2 Data Structures content on arrays and records:
(Phrasing here paraphrases the specification content; it is not a verbatim quote.)
An array is a data structure that stores a fixed-size, ordered collection of elements, all of the same data type, in contiguous memory locations. Each element is reached by an index (a position number); in most languages, and in OCR pseudocode, indexing starts at 0.
The single most important consequence of "same type" plus "contiguous" is that the machine can compute where any element lives without searching for it. If every element is the same size and they sit end-to-end, element i is always a fixed distance from the start. That is what makes indexed access constant time, O(1), and it is the property that all of arrays' strengths and weaknesses flow from.
| Property | Description |
|---|---|
| Homogeneous | All elements must be the same data type (e.g., all integers, all strings). |
| Fixed size | The size is set when the array is created and cannot change. |
| Contiguous | Elements are stored next to each other in one unbroken block of memory. |
| Indexed | Each element has a unique index, allowing direct (random) access. |
| Static | An array is a static data structure — its size is fixed when it is allocated. |
A static structure does not mean the values cannot change — you can overwrite scores[3] freely. It means the size cannot change: you cannot grow a 10-element array into an 11-element array. To "grow" one you must allocate a bigger array and copy everything across, which is exactly what dynamic list types do behind the scenes.
A 1-D array is a simple list of elements arranged in a single row.
array scores[10] // 10 elements, indices 0..9
scores[0] = 85
scores[1] = 92
scores[2] = 78
# A Python list is used as an array (it is dynamic, but we use it as fixed-size here)
scores = [85, 92, 78, 64, 91, 88, 73, 95, 67, 82]
print(scores[0]) # 85 -> first element
print(scores[9]) # 82 -> last element (index = length - 1)
scores[3] = 70 # overwrite element 3
The diagram below shows ten 4-byte integers stored contiguously from a base address of 2000.
graph LR
A["index 0<br/>@2000"] --- B["index 1<br/>@2004"] --- C["index 2<br/>@2008"] --- D["index 3<br/>@2012"] --- E["... index 9<br/>@2036"]
If the base address is B and each element occupies S bytes, the address of the element at index i is:
Address(i)=B+(i×S)
Because this is a single arithmetic expression evaluated in constant time regardless of i, array access is O(1).
An integer array starts at address 2000 and each integer occupies 4 bytes.
| Index | Calculation | Address |
|---|---|---|
| 0 | 2000+(0×4) | 2000 |
| 1 | 2000+(1×4) | 2004 |
| 2 | 2000+(2×4) | 2008 |
| 3 | 2000+(3×4) | 2012 |
| 7 | 2000+(7×4) | 2028 |
Exam Tip: Address questions are reliable marks. Always (1) write the formula, (2) substitute numbers, (3) state the unit-size assumption if it is given. Off-by-one errors usually come from forgetting that index 0 is the first element, so the multiplier is the index, not the index plus one.
A 2-D array is an "array of arrays" — it stores data in rows and columns, forming a grid. It is the natural model for a spreadsheet, a game board, a matrix, or a small image.
# 2-D array as a list of lists: 3 rows, 4 columns
grid = [
[1, 2, 3, 4], # row 0
[5, 6, 7, 8], # row 1
[9, 10, 11, 12], # row 2
]
print(grid[1][2]) # 7 -> row 1, column 2
print(grid[0][3]) # 4 -> row 0, column 3
grid[2][0] = 99 # overwrite the element at row 2, column 0
In OCR pseudocode the same array is declared array grid[3, 4] and accessed as grid[row, col] — the comma form is equivalent to the double-bracket form. The first index is conventionally the row. Reversing row and column is one of the most common slips in the exam.
To visit every element, use nested loops: the outer loop walks the rows, the inner loop walks the columns within a row.
for row in range(3): # 0,1,2
for col in range(4): # 0,1,2,3
print(grid[row][col], end=" ")
print() # newline after each row
Every cell is visited exactly once, so a full traversal of an R×C grid is O(R×C) — i.e. O(n) in the total number of elements n=R×C.
A 2-D array is still stored as a single contiguous 1-D block — the language must choose an order in which to flatten it. The two conventions are:
graph TD
subgraph "Logical grid (2 x 3)"
L["(0,0) (0,1) (0,2)<br/>(1,0) (1,1) (1,2)"]
end
subgraph "Row-major in memory"
R["(0,0)(0,1)(0,2)(1,0)(1,1)(1,2)"]
end
subgraph "Column-major in memory"
C2["(0,0)(1,0)(0,1)(1,1)(0,2)(1,2)"]
end
For a row-major array with C columns, base address B and element size S, the address of element [r][c] is:
Address(r,c)=B+((r×C)+c)×S
The term r×C skips over r complete rows; adding c steps along the current row. For column-major with R rows the roles swap: Address(r,c)=B+((c×R)+r)×S.
A row-major 2-D array has 5 columns, base address 1000, element size 4 bytes. Find the address of [2][3]:
1000+((2×5)+3)×4=1000+(13×4)=1000+52=1052
A 3-D array adds a third axis — think of it as a stack of 2-D grids, or "depth × rows × columns". It models things such as a colour image over time (frames × height × width), a Rubik's-cube state, or temperatures across a 3-D volume.
# 2 layers, each a 2x3 grid
cube = [
[[1, 2, 3], [4, 5, 6]], # layer 0
[[7, 8, 9], [10, 11, 12]], # layer 1
]
print(cube[1][0][2]) # 9 -> layer 1, row 0, column 2
A full traversal needs three nested loops, one per dimension, and is O(D×R×C). The row-major address formula generalises: with R rows and C columns per layer, element [d][r][c] sits at B+((d×R×C)+(r×C)+c)×S — each term skips a whole layer, a whole row, then steps along the row. The key insight for the exam is that however many dimensions there are, the data is ultimately one contiguous line and the index is a weighted sum.
A concrete use-case helps fix the idea. A short colour video clip is naturally a 4-D array — frame × row × column × colour-channel — but even a single still colour image is 3-D: a height-by-width grid where each cell is itself three numbers (red, green, blue). Indexing image[y][x][0] reads the red component of the pixel at row y, column x. The same row-major flattening applies, which is why image-processing code that walks pixels in row order tends to run faster than code that jumps around: it reads memory in the order it is physically stored. You will not be asked to write such code in 1.4.2, but recognising that the dimension count is just bookkeeping over one flat block is exactly the understanding the spec is after, and it stops multi-dimensional arrays from feeling like a separate, scary topic.
A record (or struct) groups together related data items of (possibly) different types under named fields. Where an array answers "give me item number 5", a record answers "give me the grade field of this student".
| Property | Description |
|---|---|
| Heterogeneous | Fields may be different data types (string, integer, real, boolean…). |
| Named fields | Each item is accessed by a field name, not a numeric index. |
| Grouped | Related but dissimilar data is kept together as one logical unit. |
| Fixed structure | The set of fields is fixed at design time (though field values vary). |
record Student
name : string
age : integer
grade : real
endrecord
student1 = new Student()
student1.name = "Alice"
student1.age = 17
student1.grade = 8.5
# A class is the usual Python stand-in for a record
class Student:
def __init__(self, name, age, grade):
self.name = name # string
self.age = age # integer
self.grade = grade # real
student1 = Student("Alice", 17, 8.5)
print(student1.name) # Alice -> access by field name (dot notation)
print(student1.grade) # 8.5
A record, like an array, is normally a single contiguous block — but instead of n identical cells it is a sequence of named fields laid out one after another, each at a fixed offset from the start of the record. The compiler records, for each field, how far into the block it begins. Accessing student1.grade therefore costs the same as array indexing: take the record's base address and add the grade field's fixed offset. So field access, like indexed array access, is O(1) — there is no search.
The crucial difference from an array is that the offsets are not all the same size, because the fields have different types: a string field, an integer field and a real field may each occupy a different number of bytes, so the offsets are fixed but uneven. (In real systems the compiler may also insert padding between fields so each starts on a convenient address boundary — you do not need this detail for the exam, but it explains why a record can occupy slightly more memory than the sum of its raw field sizes.) The mental model to carry into the exam is simple: an array is indexed by a computed position; a record is indexed by a named, fixed offset. Both give direct, constant-time access; what differs is whether the items must share a type.
It is worth stressing why we bother with records at all rather than just remembering "name is in array A at index 3, age is in array B at index 3". Records give us cohesion — the logically-connected fields are bound into one value that can be passed to a function, returned, copied or stored as a unit. This makes code easier to read and far harder to corrupt, because there is no separate index to keep in step across several arrays. When a question describes "an X with several attributes", that cohesion is precisely what a record models, and it is usually what the marks are testing.
The two structures combine constantly: an array of records stores many like-shaped records you can index, while each record bundles its own mixed fields. This is the in-memory shape of a database table — the array is the table, each record is a row, each field is a column.
students = [
Student("Alice", 17, 8.5),
Student("Bob", 18, 7.2),
Student("Charlie", 17, 9.1),
]
print(students[1].name) # Bob -> index the array, then field the record
for s in students: # linear scan: O(n)
if s.age == 17:
print(s.name) # Alice, Charlie
Exam Tip: When a scenario says "store details about many X" you almost always want an array of records: design the record (each field with a name and a type) first, then say it is held in an array. Marks are lost when candidates use parallel arrays (one array of names, one of ages) instead — that loses the grouping the question is testing.
Arrays and records are static structures. The contrast with dynamic structures (covered in the linked-lists lesson) is a recurring exam theme.
| Feature | Static (arrays, records) | Dynamic (linked lists, trees) |
|---|---|---|
| Size | Fixed at allocation | Grows / shrinks at runtime |
| Memory | One contiguous block | Scattered nodes joined by pointers |
| Access | O(1) random access by index | O(n) — must follow pointers |
| Insert / delete | O(n) — shift elements | O(1) at a known node — relink only |
| Overhead | None per element | A pointer (or two) per element |
| Waste | Wastes space if under-filled | Uses only what is needed |
| Scenario | Best choice | Reason |
|---|---|---|
| Known, fixed count | Array | No waste, O(1) access |
| Mixed data about one thing | Record | Groups heterogeneous fields |
| Frequent middle inserts/deletes | Linked list | No shifting; relink in O(1) |
| Unpredictable size | Dynamic structure | Grows/shrinks on demand |
| Fast lookup by position | Array | Direct indexed access |
The trade-off behind this table is worth stating in words, because it is the comparison examiners return to most often. A static array buys its O(1) random access by committing to a fixed, contiguous block — and that very commitment is what makes resizing and middle-insertion expensive. To make a full array bigger you cannot simply tack cells onto the end (the memory immediately after it may already belong to something else); you must allocate a brand-new, larger block and copy every existing element across, an O(n) operation. Dynamic structures invert the bargain: a linked list never copies the whole collection to grow and can splice a node into the middle in O(1), but it gives up direct indexing and pays a pointer of overhead for every element. There is no universally "best" structure — the right choice depends on whether your workload is dominated by reads at known positions (favouring arrays) or by insertions and deletions of unknown count (favouring dynamic structures). Keeping that single sentence in mind lets you justify a choice in an exam rather than merely asserting one.
Because an array is a fixed, contiguous block, adding or removing in the middle is not free — the surrounding elements have to move to keep the block gap-free and in order. Tracing these byte-by-byte is the clearest way to see why they cost O(n), and it is exactly the kind of trace OCR asks you to complete in a table.
We scan from index 0 until we find the target or run off the end. There is no random shortcut because the array is unsorted.
FUNCTION linearSearch(arr, n, target)
FOR i = 0 TO n - 1
IF arr[i] = target THEN
RETURN i // found: report the index
ENDIF
NEXT i
RETURN -1 // not found
ENDFUNCTION
Trace on arr = [85, 92, 78, 64, 91], target = 78:
| i | arr[i] | arr[i] = 78? | Action |
|---|---|---|---|
| 0 | 85 | no | continue |
| 1 | 92 | no | continue |
| 2 | 78 | yes | return 2 |
In the worst case (target absent or last) every element is compared, so linear search is O(n); the best case is O(1) when the target is first.
To insert at index pos while preserving order, every element from the end down to pos slides one place right to open a gap, then the new value drops in. We must copy from the right-hand end inwards, otherwise we would overwrite values we still need.
PROCEDURE insertAt(arr, n, pos, value)
// arr has capacity for at least n + 1 elements
FOR i = n - 1 DOWNTO pos // walk backwards
arr[i + 1] = arr[i] // shift this element one slot right
NEXT i
arr[pos] = value // drop the new value into the gap
// logical length is now n + 1
ENDPROCEDURE
Trace: insert value = 50 at pos = 1 into arr = [10, 20, 30, 40, _] (n = 4, one spare slot):
| Step | Action | Array state |
|---|---|---|
| start | — | [10, 20, 30, 40, _] |
| i = 3 | arr[4] = arr[3] | [10, 20, 30, 40, 40] |
| i = 2 | arr[3] = arr[2] | [10, 20, 30, 30, 40] |
| i = 1 | arr[2] = arr[1] | [10, 20, 20, 30, 40] |
| place | arr[1] = 50 | [10, 50, 20, 30, 40] |
Inserting at the front shifts all n elements, so a worst-case insert is O(n). Inserting at the end shifts nothing and is O(1) — which is why appending is cheap but middle-insertion is not.
Deletion is the mirror image: remove the element at pos, then every element to its right slides one place left to close the gap. Here we copy left-to-right so we never overwrite a value we still need.
PROCEDURE deleteAt(arr, n, pos)
FOR i = pos TO n - 2 // walk forwards
arr[i] = arr[i + 1] // pull the next element left
NEXT i
// logical length is now n - 1; the old last slot is stale
ENDPROCEDURE
Trace: delete pos = 1 from arr = [10, 50, 20, 30, 40] (n = 5):
| Step | Action | Array state |
|---|---|---|
| start | — | [10, 50, 20, 30, 40] |
| i = 1 | arr[1] = arr[2] | [10, 20, 20, 30, 40] |
| i = 2 | arr[2] = arr[3] | [10, 20, 30, 30, 40] |
| i = 3 | arr[3] = arr[4] | [10, 20, 30, 40, 40] |
| done | length → 4 | [10, 20, 30, 40] (last slot stale) |
Deleting from the front is again O(n); the closing-up of the gap is the unavoidable cost of keeping the data contiguous and ordered. This single observation — static arrays make you pay O(n) to keep order when inserting/deleting in the middle — is the motivation for the entire linked-list lesson that follows.
| Operation | Description | Complexity |
|---|---|---|
| Access | Read/write element at index i | O(1) |
| Linear search | Scan for a value | O(n) |
| Binary search | Find in a sorted array | O(logn) |
| Insert at end | Place at next free slot | O(1) |
| Insert at position | Shift right, then place | O(n) |
| Delete at position | Remove, shift left to close gap | O(n) |
| Traverse | Visit every element | O(n) |
The expensive operations are the ones that disturb the contiguity the array depends on: inserting or deleting in the middle forces every later element to slide along, which is O(n). This is precisely the weakness that linked lists are designed to remove.
grid[r][c] is row-then-column. Writing grid[c][r] transposes your access and is a classic 2-D slip.A college stores results for its students. Each student has a name, a candidate number and a percentage mark. Marks for one class are also held in a 2-D array grades[30, 5] (30 students × 5 subjects). (Total: 9 marks)
(a) Define a suitable record structure for a single student, naming each field and stating its data type. [3]
(b) State how the 30 student records would be stored together, and explain one advantage of this over using three separate parallel arrays. [2]
(c) The grades array is stored in row-major order, has base address 4000 and each element occupies 4 bytes. Calculate the memory address of element grades[6, 2], showing your working. [3]
(d) State the time complexity of accessing one element of the grades array by its indices, and justify it. [1]
| Mark | AO | Awarded for |
|---|---|---|
| 1 | AO1 | (a) Record with three appropriately named fields |
| 2 | AO1 | (a) Correct type for the textual/identifier fields (string) |
| 3 | AO1 | (a) Correct numeric type for the mark (integer or real) |
| 4 | AO1 | (b) Stating an array (of records) is used |
| 5 | AO2 | (b) Valid advantage — grouping keeps related fields together / cannot desync |
| 6 | AO1 | (c) Correct formula stated/used: B+((r×C)+c)×S |
| 7 | AO2 | (c) Correct substitution with C=5 |
| 8 | AO2 | (c) Correct final address 4128 |
| 9 | AO2 | (d) O(1) with justification (address computed directly) |
AO split: AO1 = 4, AO2 = 5.
(a)
record Student
name : string
candidateNumber : string
mark : integer
endrecord
(b) The 30 students are stored in an array of Student records. This is better than three separate arrays because everything about one student is in one place. (c) Address = base + ((row × columns) + col) × size = 4000 + ((6 × 5) + 2) × 4 = 4000 + 32 × 4 = 4000 + 128 = 4128. (d) Accessing by index is O(1).
Examiner-style commentary: Strong on the calculable parts — the record (marks 1-3) is well formed and the address arithmetic in (c) is fully correct (marks 6-8). Mark 4 is secured in (b). Mark 5 is partly there: "everything in one place" gestures at the right idea but does not pin down the real advantage (the fields cannot fall out of step the way parallel arrays can if one is sorted or an entry is deleted), so an examiner might be generous and award it or hold it. Mark 9 is not awarded: (d) states O(1) but gives no justification, and this mark is explicitly for the reason. Around 8/9.
(a)
record Student
name : string
candidateNumber : string // identifier, kept as string (may have leading zeros)
mark : integer // whole-number percentage 0..100
endrecord
(b) The 30 records are held together in a one-dimensional array of Student records, e.g. array class[30], so class[i].mark reads the i-th student's mark. The advantage over three parallel arrays (one of names, one of numbers, one of marks) is cohesion and integrity: each student's three fields are bound into a single record, so they cannot desynchronise. With parallel arrays, sorting one array, or deleting one entry without updating the others, silently corrupts the correspondence between name, number and mark; with an array of records the unit moves as a whole.
(c) Using the row-major address formula with base B=4000, columns per row C=5 and element size S=4: Address(6,2)=4000+((6×5)+2)×4=4000+(30+2)×4=4000+(32×4)=4000+128=4128. The 6×5 term skips the six complete rows before row 6; adding 2 steps two elements along that row.
(d) Access by index is O(1) — constant time. Because every element is the same size and the block is contiguous, the address of any grades[r, c] is found by a single arithmetic expression (the formula above), independent of r, c and the array's size. No scanning is required, so the cost does not grow with n.
Examiner-style commentary: Full 9/9. The discriminators above the mid-band script are: in (b) the answer names the specific failure mode of parallel arrays (desynchronisation under sort/delete) rather than asserting a vague benefit, securing mark 5 cleanly; and in (d) it earns mark 9 by tying O(1) to the mechanism — one address computation independent of size — instead of merely stating the order. The annotated working in (c) also makes the marker's job trivial.
vector (covered later in this course) amortises resizing — it doubles its backing array when full, giving amortised O(1) append despite occasional O(n) copies.Exam Tip: The reliable arrays-and-records marks are (1) a clean record definition with named, typed fields, (2) an array-of-records for "many entities", and (3) a row-major address calculation with the formula written out. Practise these three to fluency — they recur every series and are quick to bank.