Arrays and Records

Arrays and records are the two foundational building blocks of every other structure in this course. An array stores many items of the same type so we can index them; a record groups a few items of different types so we can treat them as one logical thing. Almost everything else — stacks, queues, hash tables, even the nodes of a tree — is ultimately built on top of these two ideas.

Spec Mapping

This lesson addresses the H446 1.4.2 Data Structures content on arrays and records:

Describe, declare and access one-, two- and three-dimensional arrays, understanding that an array is an ordered, fixed-size collection of elements of a single type.
Explain how arrays are laid out in contiguous memory, including row-major and column-major ordering for multi-dimensional arrays, and calculate the address of any element.
Define a record as a grouping of related fields of (possibly) different types, and contrast records with arrays.
Combine the two — arrays of records — and reason about when a static array/record is the right choice versus a dynamic structure such as a linked list.

(Phrasing here paraphrases the specification content; it is not a verbatim quote.)

What is an Array?

An array is a data structure that stores a fixed-size, ordered collection of elements, all of the same data type, in contiguous memory locations. Each element is reached by an index (a position number); in most languages, and in OCR pseudocode, indexing starts at 0.

The single most important consequence of "same type" plus "contiguous" is that the machine can compute where any element lives without searching for it. If every element is the same size and they sit end-to-end, element i is always a fixed distance from the start. That is what makes indexed access constant time, $O(1)$ , and it is the property that all of arrays' strengths and weaknesses flow from.

Key Properties

Property	Description
Homogeneous	All elements must be the same data type (e.g., all integers, all strings).
Fixed size	The size is set when the array is created and cannot change.
Contiguous	Elements are stored next to each other in one unbroken block of memory.
Indexed	Each element has a unique index, allowing direct (random) access.
Static	An array is a static data structure — its size is fixed when it is allocated.

A static structure does not mean the values cannot change — you can overwrite scores[3] freely. It means the size cannot change: you cannot grow a 10-element array into an 11-element array. To "grow" one you must allocate a bigger array and copy everything across, which is exactly what dynamic list types do behind the scenes.

1-D Arrays (One-Dimensional)

A 1-D array is a simple list of elements arranged in a single row.

Declaration (OCR-style pseudocode)

array scores[10]      // 10 elements, indices 0..9
scores[0] = 85
scores[1] = 92
scores[2] = 78

Declaration (Python)

# A Python list is used as an array (it is dynamic, but we use it as fixed-size here)
scores = [85, 92, 78, 64, 91, 88, 73, 95, 67, 82]

print(scores[0])    # 85  -> first element
print(scores[9])    # 82  -> last element (index = length - 1)
scores[3] = 70      # overwrite element 3

How a 1-D Array is Stored in Memory

The diagram below shows ten 4-byte integers stored contiguously from a base address of 2000.

graph LR
    A["index 0<br/>@2000"] --- B["index 1<br/>@2004"] --- C["index 2<br/>@2008"] --- D["index 3<br/>@2012"] --- E["... index 9<br/>@2036"]

If the base address is $B$ and each element occupies $S$ bytes, the address of the element at index $i$ is:

$\text{Address}(i) = B + (i \times S)$

Because this is a single arithmetic expression evaluated in constant time regardless of $i$ , array access is $O(1)$ .

Worked Example: Address Calculation

An integer array starts at address 2000 and each integer occupies 4 bytes.

Index	Calculation	Address
0	$2000 + (0 \times 4)$	2000
1	$2000 + (1 \times 4)$	2004
2	$2000 + (2 \times 4)$	2008
3	$2000 + (3 \times 4)$	2012
7	$2000 + (7 \times 4)$	2028

Exam Tip: Address questions are reliable marks. Always (1) write the formula, (2) substitute numbers, (3) state the unit-size assumption if it is given. Off-by-one errors usually come from forgetting that index 0 is the first element, so the multiplier is the index, not the index plus one.

2-D Arrays (Two-Dimensional)

A 2-D array is an "array of arrays" — it stores data in rows and columns, forming a grid. It is the natural model for a spreadsheet, a game board, a matrix, or a small image.

Declaration and Access

# 2-D array as a list of lists: 3 rows, 4 columns
grid = [
    [1,  2,  3,  4],    # row 0
    [5,  6,  7,  8],    # row 1
    [9, 10, 11, 12],    # row 2
]

print(grid[1][2])   # 7  -> row 1, column 2
print(grid[0][3])   # 4  -> row 0, column 3
grid[2][0] = 99     # overwrite the element at row 2, column 0

In OCR pseudocode the same array is declared array grid[3, 4] and accessed as grid[row, col] — the comma form is equivalent to the double-bracket form. The first index is conventionally the row. Reversing row and column is one of the most common slips in the exam.

Traversing a 2-D Array

To visit every element, use nested loops: the outer loop walks the rows, the inner loop walks the columns within a row.

for row in range(3):          # 0,1,2
    for col in range(4):      # 0,1,2,3
        print(grid[row][col], end=" ")
    print()                   # newline after each row

Every cell is visited exactly once, so a full traversal of an $R \times C$ grid is $O(R \times C)$ — i.e. $O(n)$ in the total number of elements $n = R \times C$ .

Memory Layout: Row-Major vs Column-Major

A 2-D array is still stored as a single contiguous 1-D block — the language must choose an order in which to flatten it. The two conventions are:

Row-major order (C, C++, Python lists, Java): store all of row 0, then all of row 1, then row 2. Elements in the same row are adjacent in memory.
Column-major order (Fortran, MATLAB, R): store all of column 0, then all of column 1. Elements in the same column are adjacent.

graph TD
    subgraph "Logical grid (2 x 3)"
        L["(0,0) (0,1) (0,2)<br/>(1,0) (1,1) (1,2)"]
    end
    subgraph "Row-major in memory"
        R["(0,0)(0,1)(0,2)(1,0)(1,1)(1,2)"]
    end
    subgraph "Column-major in memory"
        C2["(0,0)(1,0)(0,1)(1,1)(0,2)(1,2)"]
    end

For a row-major array with $C$ columns, base address $B$ and element size $S$ , the address of element $[r][c]$ is:

$\text{Address}(r, c) = B + \big((r \times C) + c\big) \times S$

The term $r \times C$ skips over $r$ complete rows; adding $c$ steps along the current row. For column-major with $R$ rows the roles swap: $\text{Address}(r, c) = B + \big((c \times R) + r\big) \times S$ .

Worked Example (Row-Major)

A row-major 2-D array has 5 columns, base address 1000, element size 4 bytes. Find the address of $[2][3]$ :

$1000 + \big((2 \times 5) + 3\big) \times 4 = 1000 + (13 \times 4) = 1000 + 52 = 1052$

3-D Arrays (Three-Dimensional)

A 3-D array adds a third axis — think of it as a stack of 2-D grids, or "depth × rows × columns". It models things such as a colour image over time (frames × height × width), a Rubik's-cube state, or temperatures across a 3-D volume.

# 2 layers, each a 2x3 grid
cube = [
    [[1, 2, 3], [4, 5, 6]],      # layer 0
    [[7, 8, 9], [10, 11, 12]],   # layer 1
]
print(cube[1][0][2])    # 9  -> layer 1, row 0, column 2

A full traversal needs three nested loops, one per dimension, and is $O(D \times R \times C)$ . The row-major address formula generalises: with $R$ rows and $C$ columns per layer, element $[d][r][c]$ sits at $B + \big((d \times R \times C) + (r \times C) + c\big) \times S$ — each term skips a whole layer, a whole row, then steps along the row. The key insight for the exam is that however many dimensions there are, the data is ultimately one contiguous line and the index is a weighted sum.

A concrete use-case helps fix the idea. A short colour video clip is naturally a 4-D array — frame × row × column × colour-channel — but even a single still colour image is 3-D: a height-by-width grid where each cell is itself three numbers (red, green, blue). Indexing image[y][x][0] reads the red component of the pixel at row y, column x. The same row-major flattening applies, which is why image-processing code that walks pixels in row order tends to run faster than code that jumps around: it reads memory in the order it is physically stored. You will not be asked to write such code in 1.4.2, but recognising that the dimension count is just bookkeeping over one flat block is exactly the understanding the spec is after, and it stops multi-dimensional arrays from feeling like a separate, scary topic.

Records (Structs)

A record (or struct) groups together related data items of (possibly) different types under named fields. Where an array answers "give me item number 5", a record answers "give me the grade field of this student".

Key Properties

Property	Description
Heterogeneous	Fields may be different data types (string, integer, real, boolean…).
Named fields	Each item is accessed by a field name, not a numeric index.
Grouped	Related but dissimilar data is kept together as one logical unit.
Fixed structure	The set of fields is fixed at design time (though field values vary).

Declaration (OCR-style pseudocode)

record Student
    name  : string
    age   : integer
    grade : real
endrecord

student1 = new Student()
student1.name  = "Alice"
student1.age   = 17
student1.grade = 8.5

Declaration (Python)

# A class is the usual Python stand-in for a record
class Student:
    def __init__(self, name, age, grade):
        self.name  = name      # string
        self.age   = age       # integer
        self.grade = grade     # real

student1 = Student("Alice", 17, 8.5)
print(student1.name)     # Alice  -> access by field name (dot notation)
print(student1.grade)    # 8.5

How a Record is Stored in Memory

A record, like an array, is normally a single contiguous block — but instead of $n$ identical cells it is a sequence of named fields laid out one after another, each at a fixed offset from the start of the record. The compiler records, for each field, how far into the block it begins. Accessing student1.grade therefore costs the same as array indexing: take the record's base address and add the grade field's fixed offset. So field access, like indexed array access, is $O(1)$ — there is no search.

The crucial difference from an array is that the offsets are not all the same size, because the fields have different types: a string field, an integer field and a real field may each occupy a different number of bytes, so the offsets are fixed but uneven. (In real systems the compiler may also insert padding between fields so each starts on a convenient address boundary — you do not need this detail for the exam, but it explains why a record can occupy slightly more memory than the sum of its raw field sizes.) The mental model to carry into the exam is simple: an array is indexed by a computed position; a record is indexed by a named, fixed offset. Both give direct, constant-time access; what differs is whether the items must share a type.

It is worth stressing why we bother with records at all rather than just remembering "name is in array A at index 3, age is in array B at index 3". Records give us cohesion — the logically-connected fields are bound into one value that can be passed to a function, returned, copied or stored as a unit. This makes code easier to read and far harder to corrupt, because there is no separate index to keep in step across several arrays. When a question describes "an X with several attributes", that cohesion is precisely what a record models, and it is usually what the marks are testing.

Arrays of Records

The two structures combine constantly: an array of records stores many like-shaped records you can index, while each record bundles its own mixed fields. This is the in-memory shape of a database table — the array is the table, each record is a row, each field is a column.

students = [
    Student("Alice",   17, 8.5),
    Student("Bob",     18, 7.2),
    Student("Charlie", 17, 9.1),
]

print(students[1].name)          # Bob -> index the array, then field the record
for s in students:               # linear scan: O(n)
    if s.age == 17:
        print(s.name)            # Alice, Charlie

Exam Tip: When a scenario says "store details about many X" you almost always want an array of records: design the record (each field with a name and a type) first, then say it is held in an array. Marks are lost when candidates use parallel arrays (one array of names, one of ages) instead — that loses the grouping the question is testing.

Static vs Dynamic Structures

Arrays and records are static structures. The contrast with dynamic structures (covered in the linked-lists lesson) is a recurring exam theme.

Feature	Static (arrays, records)	Dynamic (linked lists, trees)
Size	Fixed at allocation	Grows / shrinks at runtime
Memory	One contiguous block	Scattered nodes joined by pointers
Access	$O(1)$ random access by index	$O(n)$ — must follow pointers
Insert / delete	$O(n)$ — shift elements	$O(1)$ at a known node — relink only
Overhead	None per element	A pointer (or two) per element
Waste	Wastes space if under-filled	Uses only what is needed

When to Use Each

Scenario	Best choice	Reason
Known, fixed count	Array	No waste, $O(1)$ access
Mixed data about one thing	Record	Groups heterogeneous fields
Frequent middle inserts/deletes	Linked list	No shifting; relink in $O(1)$
Unpredictable size	Dynamic structure	Grows/shrinks on demand
Fast lookup by position	Array	Direct indexed access

The trade-off behind this table is worth stating in words, because it is the comparison examiners return to most often. A static array buys its $O(1)$ random access by committing to a fixed, contiguous block — and that very commitment is what makes resizing and middle-insertion expensive. To make a full array bigger you cannot simply tack cells onto the end (the memory immediately after it may already belong to something else); you must allocate a brand-new, larger block and copy every existing element across, an $O(n)$ operation. Dynamic structures invert the bargain: a linked list never copies the whole collection to grow and can splice a node into the middle in $O(1)$ , but it gives up direct indexing and pays a pointer of overhead for every element. There is no universally "best" structure — the right choice depends on whether your workload is dominated by reads at known positions (favouring arrays) or by insertions and deletions of unknown count (favouring dynamic structures). Keeping that single sentence in mind lets you justify a choice in an exam rather than merely asserting one.

Traced Array Operations

Because an array is a fixed, contiguous block, adding or removing in the middle is not free — the surrounding elements have to move to keep the block gap-free and in order. Tracing these byte-by-byte is the clearest way to see why they cost $O(n)$ , and it is exactly the kind of trace OCR asks you to complete in a table.

Linear search

We scan from index 0 until we find the target or run off the end. There is no random shortcut because the array is unsorted.

FUNCTION linearSearch(arr, n, target)
    FOR i = 0 TO n - 1
        IF arr[i] = target THEN
            RETURN i              // found: report the index
        ENDIF
    NEXT i
    RETURN -1                     // not found
ENDFUNCTION

Trace on arr = [85, 92, 78, 64, 91], target = 78:

i	`arr[i]`	`arr[i] = 78`?	Action
0	85	no	continue
1	92	no	continue
2	78	yes	return 2

In the worst case (target absent or last) every element is compared, so linear search is $O(n)$ ; the best case is $O(1)$ when the target is first.

Insert at a position (shift right)

To insert at index pos while preserving order, every element from the end down to pos slides one place right to open a gap, then the new value drops in. We must copy from the right-hand end inwards, otherwise we would overwrite values we still need.

PROCEDURE insertAt(arr, n, pos, value)
    // arr has capacity for at least n + 1 elements
    FOR i = n - 1 DOWNTO pos       // walk backwards
        arr[i + 1] = arr[i]        // shift this element one slot right
    NEXT i
    arr[pos] = value               // drop the new value into the gap
    // logical length is now n + 1
ENDPROCEDURE

Trace: insert value = 50 at pos = 1 into arr = [10, 20, 30, 40, _] (n = 4, one spare slot):

Step	Action	Array state
start	—	`[10, 20, 30, 40, _]`
i = 3	`arr[4] = arr[3]`	`[10, 20, 30, 40, 40]`
i = 2	`arr[3] = arr[2]`	`[10, 20, 30, 30, 40]`
i = 1	`arr[2] = arr[1]`	`[10, 20, 20, 30, 40]`
place	`arr[1] = 50`	`[10, 50, 20, 30, 40]`

Inserting at the front shifts all $n$ elements, so a worst-case insert is $O(n)$ . Inserting at the end shifts nothing and is $O(1)$ — which is why appending is cheap but middle-insertion is not.

Delete at a position (shift left)

Deletion is the mirror image: remove the element at pos, then every element to its right slides one place left to close the gap. Here we copy left-to-right so we never overwrite a value we still need.

PROCEDURE deleteAt(arr, n, pos)
    FOR i = pos TO n - 2           // walk forwards
        arr[i] = arr[i + 1]        // pull the next element left
    NEXT i
    // logical length is now n - 1; the old last slot is stale
ENDPROCEDURE

Trace: delete pos = 1 from arr = [10, 50, 20, 30, 40] (n = 5):

Step	Action	Array state
start	—	`[10, 50, 20, 30, 40]`
i = 1	`arr[1] = arr[2]`	`[10, 20, 20, 30, 40]`
i = 2	`arr[2] = arr[3]`	`[10, 20, 30, 30, 40]`
i = 3	`arr[3] = arr[4]`	`[10, 20, 30, 40, 40]`
done	length → 4	`[10, 20, 30, 40]` (last slot stale)

Deleting from the front is again $O(n)$ ; the closing-up of the gap is the unavoidable cost of keeping the data contiguous and ordered. This single observation — static arrays make you pay $O(n)$ to keep order when inserting/deleting in the middle — is the motivation for the entire linked-list lesson that follows.

Common Operations on Arrays

Operation	Description	Complexity
Access	Read/write element at index $i$	$O(1)$
Linear search	Scan for a value	$O(n)$
Binary search	Find in a sorted array	$O(\log n)$
Insert at end	Place at next free slot	$O(1)$
Insert at position	Shift right, then place	$O(n)$
Delete at position	Remove, shift left to close gap	$O(n)$
Traverse	Visit every element	$O(n)$

The expensive operations are the ones that disturb the contiguity the array depends on: inserting or deleting in the middle forces every later element to slide along, which is $O(n)$ . This is precisely the weakness that linked lists are designed to remove.

Synoptic Links

Linked lists (next lesson): the direct dynamic counterpart — trade $O(1)$ indexed access for $O(1)$ middle insertion and runtime resizing. The static-vs-dynamic table above is the bridge.
Stacks, queues, hash tables, vectors: all can be implemented on top of an array. A stack is "an array plus a top pointer"; a hash table is "an array plus a hash function". The array's $O(1)$ indexing is what makes these fast.
Binary search (algorithms course): only works because an array gives $O(1)$ access to the midpoint, enabling the $O(\log n)$ halving. On a linked list the same algorithm degrades to $O(n)$ because there is no random access. The derivation of binary search lives in the algorithms course — this course owns the structure it runs on.
2-D / 3-D arrays ↔ matrices and images: row-major layout explains why iterating an image row-by-row is faster than column-by-column on most machines (cache locality), a link to the memory and processor topics.
Records ↔ databases (1.3 / 1.4.3): an array of records is the in-memory shape of a relational table; fields map to attributes/columns.

Common Errors & A-Level Misconceptions

"Static means the values can't change." No — static refers to the size being fixed. You can overwrite any element; you simply cannot resize the array.
Off-by-one on the last index. An array of length $n$ has indices $0$ to $n-1$ . Accessing index $n$ is out of bounds. Half of address-calculation errors trace back to this.
Swapping row and column. grid[r][c] is row-then-column. Writing grid[c][r] transposes your access and is a classic 2-D slip.
Confusing row-major with column-major. If a question states column-major, you must swap the dimension used in the offset; do not default to the row-major formula.
Using parallel arrays instead of an array of records. Storing names, ages and grades in three separate arrays loses the grouping and is fragile (a sort of one array desynchronises the others). When the data describes one entity, use a record.
Claiming array access is $O(n)$ . Indexed access is $O(1)$ by direct address calculation; it is searching that is $O(n)$ .
Thinking a 2-D array is "two arrays". It is one contiguous block addressed with two indices, not two independent 1-D arrays.

Specimen Question

A college stores results for its students. Each student has a name, a candidate number and a percentage mark. Marks for one class are also held in a 2-D array grades[30, 5] (30 students × 5 subjects). (Total: 9 marks)

(a) Define a suitable record structure for a single student, naming each field and stating its data type. [3]

(b) State how the 30 student records would be stored together, and explain one advantage of this over using three separate parallel arrays. [2]

(c) The grades array is stored in row-major order, has base address 4000 and each element occupies 4 bytes. Calculate the memory address of element grades[6, 2], showing your working. [3]

(d) State the time complexity of accessing one element of the grades array by its indices, and justify it. [1]

AO breakdown

Mark	AO	Awarded for
1	AO1	(a) Record with three appropriately named fields
2	AO1	(a) Correct type for the textual/identifier fields (string)
3	AO1	(a) Correct numeric type for the mark (integer or real)
4	AO1	(b) Stating an array (of records) is used
5	AO2	(b) Valid advantage — grouping keeps related fields together / cannot desync
6	AO1	(c) Correct formula stated/used: $B + ((r \times C) + c) \times S$
7	AO2	(c) Correct substitution with $C = 5$
8	AO2	(c) Correct final address 4128
9	AO2	(d) $O(1)$ with justification (address computed directly)

AO split: AO1 = 4, AO2 = 5.

Mid-band response

(a)

record Student
    name  : string
    candidateNumber : string
    mark  : integer
endrecord

(b) The 30 students are stored in an array of Student records. This is better than three separate arrays because everything about one student is in one place. (c) Address = base + ((row × columns) + col) × size = 4000 + ((6 × 5) + 2) × 4 = 4000 + 32 × 4 = 4000 + 128 = 4128. (d) Accessing by index is $O(1)$ .

Examiner-style commentary: Strong on the calculable parts — the record (marks 1-3) is well formed and the address arithmetic in (c) is fully correct (marks 6-8). Mark 4 is secured in (b). Mark 5 is partly there: "everything in one place" gestures at the right idea but does not pin down the real advantage (the fields cannot fall out of step the way parallel arrays can if one is sorted or an entry is deleted), so an examiner might be generous and award it or hold it. Mark 9 is not awarded: (d) states $O(1)$ but gives no justification, and this mark is explicitly for the reason. Around 8/9.

Top-band response

(a)

record Student
    name            : string
    candidateNumber : string   // identifier, kept as string (may have leading zeros)
    mark            : integer  // whole-number percentage 0..100
endrecord

(b) The 30 records are held together in a one-dimensional array of Student records, e.g. array class[30], so class[i].mark reads the i-th student's mark. The advantage over three parallel arrays (one of names, one of numbers, one of marks) is cohesion and integrity: each student's three fields are bound into a single record, so they cannot desynchronise. With parallel arrays, sorting one array, or deleting one entry without updating the others, silently corrupts the correspondence between name, number and mark; with an array of records the unit moves as a whole.

(c) Using the row-major address formula with base $B = 4000$ , columns per row $C = 5$ and element size $S = 4$ : $\text{Address}(6, 2) = 4000 + \big((6 \times 5) + 2\big) \times 4 = 4000 + (30 + 2) \times 4 = 4000 + (32 \times 4) = 4000 + 128 = 4128.$ The $6 \times 5$ term skips the six complete rows before row 6; adding 2 steps two elements along that row.

(d) Access by index is $O(1)$ — constant time. Because every element is the same size and the block is contiguous, the address of any grades[r, c] is found by a single arithmetic expression (the formula above), independent of $r$ , $c$ and the array's size. No scanning is required, so the cost does not grow with $n$ .

Examiner-style commentary: Full 9/9. The discriminators above the mid-band script are: in (b) the answer names the specific failure mode of parallel arrays (desynchronisation under sort/delete) rather than asserting a vague benefit, securing mark 5 cleanly; and in (d) it earns mark 9 by tying $O(1)$ to the mechanism — one address computation independent of size — instead of merely stating the order. The annotated working in (c) also makes the marker's job trivial.

Going Further

Cache locality: because row-major arrays place a row contiguously, iterating an image or matrix row-by-row reads memory in the order the CPU cache prefetches it, often running several times faster than column-by-column iteration of the same data. This connects arrays to the processor/memory hierarchy.
Jagged vs rectangular arrays: some languages allow "jagged" arrays whose rows have different lengths (an array of independent 1-D arrays), which breaks the single-block address formula. Contrast this with a true rectangular 2-D array.
Dynamic arrays / vectors: investigate how a vector (covered later in this course) amortises resizing — it doubles its backing array when full, giving amortised $O(1)$ append despite occasional $O(n)$ copies.
Records vs tuples vs named tuples: explore how different languages model records, and why named fields are safer than positional tuples for maintainability.

Exam Tip: The reliable arrays-and-records marks are (1) a clean record definition with named, typed fields, (2) an array-of-records for "many entities", and (3) a row-major address calculation with the formula written out. Practise these three to fluency — they recur every series and are quick to bank.

Arrays and Records

Arrays and Records

Spec Mapping

What is an Array?

Key Properties

1-D Arrays (One-Dimensional)

Declaration (OCR-style pseudocode)

Declaration (Python)

How a 1-D Array is Stored in Memory

Worked Example: Address Calculation

2-D Arrays (Two-Dimensional)

Declaration and Access

Traversing a 2-D Array

Memory Layout: Row-Major vs Column-Major

Worked Example (Row-Major)

3-D Arrays (Three-Dimensional)

Records (Structs)

Key Properties

Declaration (OCR-style pseudocode)

Declaration (Python)

How a Record is Stored in Memory

Arrays of Records

Static vs Dynamic Structures

When to Use Each

Traced Array Operations

Linear search

Insert at a position (shift right)

Delete at a position (shift left)

Common Operations on Arrays

Synoptic Links

Common Errors & A-Level Misconceptions

Specimen Question

AO breakdown

Mid-band response

Top-band response

Going Further

More in Computer Science