Hash Tables

This lesson covers hash tables — a data structure that provides fast average-case lookup, insertion, and deletion. Hash tables are a key topic in the OCR A-Level Computer Science (H446) specification, section 1.4.

What is a Hash Table?

A hash table (also called a hash map) is a data structure that stores key-value pairs and uses a hash function to compute an index (position) in an underlying array where the value is stored.

Why Hash Tables?

Operation	Array (unsorted)	Sorted Array	Hash Table
Search	O(n)	O(log n)	O(1) average
Insert	O(1) at end	O(n) shift	O(1) average
Delete	O(n)	O(n) shift	O(1) average

Hash tables achieve O(1) average-case performance for all three operations — making them one of the most efficient data structures for lookup-heavy tasks.

Hash Functions

A hash function takes a key (e.g., a string, number) and converts it into an integer index within the range of the array.

Properties of a Good Hash Function

Property	Description
Deterministic	Same key always produces the same hash
Uniform distribution	Spreads keys evenly across the array
Fast to compute	Should be O(1)
Minimises collisions	Different keys should map to different indices

Common Hash Function: Modulo (MOD)

The simplest hash function uses the modulo operator:

hash(key) = key MOD table_size

Worked Example

Table size = 7. Hash function: h(key) = key MOD 7

Key	Calculation	Index
14	14 MOD 7	0
21	21 MOD 7	0 (collision with 14!)
15	15 MOD 7	1
10	10 MOD 7	3
23	23 MOD 7	2

Hashing Strings

For string keys, a common approach is to sum the ASCII/Unicode values and apply MOD:

h(string) = (sum of character values) MOD table_size

Example: h("CAT") with table size 11

C = 67, A = 65, T = 84
Sum = 67 + 65 + 84 = 216
h("CAT") = 216 MOD 11 = 7

Exam Tip: You must be able to apply a hash function to keys and determine where they are placed in the table. Show your working clearly.

Collisions

A collision occurs when two different keys hash to the same index. Since an array slot can only hold one value, we need a strategy to handle collisions.

Why Collisions Happen

The number of possible keys is usually much larger than the table size.
No hash function can guarantee unique indices for all possible keys (this is the pigeonhole principle).

Collision Handling: Chaining

Chaining (also called closed addressing) stores multiple items at the same index using a linked list.

How It Works

Each slot in the hash table contains a pointer to a linked list. When a collision occurs, the new item is appended to the list at that index.

Index 0: [14] -> [21] -> NULL
Index 1: [15] -> NULL
Index 2: [23] -> NULL
Index 3: [10] -> NULL
Index 4: NULL
Index 5: NULL
Index 6: NULL

Performance with Chaining

Operation	Average Case	Worst Case
Search	O(1 + n/m) where n/m is the load factor	O(n) — all keys in one chain
Insert	O(1)	O(1)
Delete	O(1 + n/m)	O(n)

Collision Handling: Open Addressing (Linear Probing)

Open addressing stores all items directly in the array. When a collision occurs, the algorithm probes for the next available slot.

Linear Probing

If the hash index is occupied, check the next slot (index + 1), then the next, wrapping around if necessary.

Probe sequence: h(key), h(key)+1, h(key)+2, ... (all MOD table_size)

Worked Example

Table size = 7. h(key) = key MOD 7. Insert: 14, 21, 15, 10, 7.

Hash Tables

Hash Tables

What is a Hash Table?

Why Hash Tables?

Hash Functions

Properties of a Good Hash Function

Common Hash Function: Modulo (MOD)

Worked Example

Hashing Strings

Collisions

Why Collisions Happen

Collision Handling: Chaining

How It Works

Performance with Chaining

Collision Handling: Open Addressing (Linear Probing)

Linear Probing

Worked Example

More in Computer Science