Primary Storage

Primary storage is the memory the CPU can reach directly — registers, cache, RAM and ROM — and it exists in a carefully engineered hierarchy that trades speed against capacity and cost. This lesson explains how each type works, why the hierarchy is shaped the way it is, and how it underpins everything the FDE cycle does.

Spec Mapping

This lesson develops OCR H446 section 1.1.2 (types of processor / memory) as it concerns primary (main) storage and the memory hierarchy. It covers RAM (contrasting DRAM and SRAM), ROM and its uses (firmware, BIOS/UEFI), the registers and the cache hierarchy (L1/L2/L3), the meaning of volatility, and the rationale for the memory hierarchy itself. It links tightly to the FDE cycle (1.1.1) — where registers and cache determine instruction throughput — and cross-links to virtual memory, which is treated fully in the software-systems memory management lesson rather than here.

What Is Primary Storage?

Primary storage (main memory) is storage the CPU can address directly over the system bus, without going through an I/O controller. It spans the fast inner tiers — the CPU's own registers and cache — and the larger RAM and ROM chips on the motherboard. It is characterised by fast access but, for the RAM portion, volatility and limited capacity compared with secondary storage. The defining contrast with secondary storage (HDD/SSD) is directness: the CPU fetches an instruction from RAM by placing its address on the bus, whereas reaching a file on disk means asking a separate controller to retrieve a block.

Volatility is the key property to fix early:

Term	Meaning	Examples
Volatile	Loses all contents when power is removed	Registers, cache, RAM (DRAM/SRAM)
Non-volatile	Retains contents without power	ROM, flash, all secondary storage

RAM — Random Access Memory

RAM is volatile working memory: it holds the operating system, the currently running programs and their data while the machine is on, and is wiped when power is lost. "Random access" means any address takes the same time to reach — unlike a spinning disk, where physical position matters. RAM comes in two fundamentally different technologies.

DRAM (Dynamic RAM)

Feature	Detail
How it works	Each bit is one capacitor (plus one transistor). A charged capacitor = 1, discharged = 0. Charge leaks away, so every cell must be refreshed (read and rewritten) thousands of times a second
Density	High — just 1 transistor + 1 capacitor per bit, so billions of cells fit on a chip
Speed	Slower — refresh cycles and the time to sense a tiny capacitor charge add latency
Cost	Cheap per bit
Use	Main memory in PCs, laptops, phones, servers

SRAM (Static RAM)

Feature	Detail
How it works	Each bit is a flip-flop of typically 6 transistors that holds its state as long as power is supplied — no refresh needed
Density	Low — many transistors per bit, so far fewer bits per unit area
Speed	Very fast — no refresh delay; state is read directly
Cost	Expensive per bit
Use	CPU cache (L1/L2/L3) and registers

DRAM vs SRAM

Feature	DRAM	SRAM
Storage mechanism	Capacitor (charge)	Flip-flop (6 transistors)
Needs refreshing?	Yes	No
Speed	Slower	Faster
Density	Higher	Lower
Cost per bit	Cheaper	More expensive
Used for	Main memory	Cache, registers

Exam Tip: "Why does cache use SRAM, not DRAM?" SRAM is faster because it needs no refresh, so the CPU gets data in the fewest possible clock cycles. SRAM's higher cost and lower density are acceptable because cache is small — the very reason we cannot make all of main memory from SRAM.

How RAM Connects to the CPU — Addresses, Buses and Word Size

RAM is "random access" precisely because of how it is wired to the processor, which ties straight back to the FDE cycle. Main memory is an array of numbered locations, and the CPU reaches any of them through the system buses met in the architecture lesson:

The address bus carries the location number the CPU wants. Its width fixes how much memory can be addressed: with $n$ address lines there are $2^{n}$ distinct addresses. A 32-bit address bus can address $2^{32}$ locations (≈ 4 GiB), which is exactly why 32-bit systems hit a ~4 GB memory ceiling; 64-bit address buses lift this astronomically.
The data bus carries the contents to or from that location; its width sets how many bits move per transfer (a wider data bus moves a whole word at once).
The control bus carries the read/write signal that tells memory which operation to perform.

So a memory read is just the FDE pattern in miniature: the CPU places an address on the address bus (via the MAR), asserts read on the control bus, and the addressed RAM returns the contents on the data bus (into the MDR). Because any address is selected directly by its number — rather than by physically moving to it as on a disk — every location takes the same time to reach, which is the literal meaning of random access. This is also why "directly addressable by the CPU" is the dividing line between primary storage (on the buses) and secondary storage (reached through a controller).

Why DRAM Refreshes and SRAM Does Not (Mechanism)

It is worth understanding why the two RAM types behave so differently, because the mechanism is the source of every property in the comparison table. In DRAM, a 1 is a charged capacitor. A capacitor is just two conductors separated by an insulator, and no insulator is perfect, so the stored charge leaks away within milliseconds. Left alone, every 1 would decay into a 0 and the data would be lost — so a refresh circuit must read each row and write it back thousands of times a second to top the charge up. This refreshing is what makes DRAM slower (cells are periodically busy being refreshed and reading a tiny capacitor charge is delicate) and consumes power even when idle, but the one-transistor-one-capacitor cell is tiny, giving DRAM its density and low cost.

In SRAM, a bit is held by a flip-flop — typically six transistors cross-coupled so they latch each other into a stable 1 or 0 state. As long as power is supplied the latch actively holds its value, so there is nothing to refresh and reads are immediate. The price is that six transistors per bit make SRAM far larger and dearer per bit than DRAM — which is exactly why it is reserved for the small, speed-critical cache and registers, and DRAM is used for bulk main memory.

ROM — Read-Only Memory

ROM is non-volatile — it keeps its contents with the power off — and holds permanent, essential code that rarely or never changes. Modern "ROM" is usually flash that can be updated occasionally (a firmware update), but it behaves as read-only in normal operation.

Variant	Description
ROM	Contents fixed at manufacture; cannot be changed
PROM	Programmable once after manufacture with a special programmer
EPROM	Erasable by UV light, then reprogrammable
EEPROM	Electrically erasable and reprogrammable, byte by byte
Flash	A block-erasable form of EEPROM; the basis of firmware storage and SSDs

The progression in that table is one of increasing rewritability: from completely fixed (ROM), to write-once (PROM), to UV-erasable (EPROM), to electrically byte-erasable (EEPROM), to fast block-erasable flash. This is why "ROM" today usually means flash that can be reprogrammed for a firmware update yet still behaves as read-only during normal use — the reason a phone or router can receive a security update without losing its non-volatile boot code.

Common Uses of ROM

BIOS / UEFI firmware. When a computer is switched on, RAM is empty and volatile, so it holds nothing useful — the CPU must fetch its very first instructions from somewhere non-volatile. The BIOS/UEFI firmware in ROM/flash provides them.
Embedded firmware. The fixed program controlling a washing machine, microwave, router or car ECU lives in ROM/flash so it survives power loss and cannot be casually altered (links to the embedded-systems lesson).

The boot sequence makes the need for ROM alongside RAM concrete. Trace what happens at power-on:

Power on: RAM is empty (it is volatile). The CPU's program counter is hard-wired to start fetching from a fixed address that maps to the ROM/flash firmware.
POST: the firmware runs a power-on self-test, checking that the CPU, memory and essential hardware are present and working.
Initialise hardware: it configures the basic devices needed to start (storage controllers, keyboard, display).
Find the bootloader: it locates the bootloader on secondary storage (the disk/SSD).
Bootstrap the OS: it loads the bootloader into RAM and hands control to it; the bootloader then loads the rest of the operating system into RAM, and from then on the machine runs from RAM.

This is the textbook answer to "why does a computer need both ROM and RAM?": without non-volatile ROM there would be no startup code after a power cycle to get the volatile RAM populated in the first place — a chicken-and-egg problem the ROM firmware solves.

Exam Tip: The cleanest "RAM vs ROM" discriminator is volatility plus role: ROM is non-volatile and holds the unchanging startup/boot firmware; RAM is volatile and holds whatever is running right now. If asked "why both?", give the boot argument — RAM is empty at power-on, so the first instructions must come from ROM.

Registers — the Fastest Tier

At the very top of the hierarchy, inside the CPU, sit the registers: a handful of tiny, ultra-fast SRAM-based stores the ALU operates on directly, accessed in a single clock cycle. You met them in the FDE lesson — the PC, MAR, MDR, CIR and Accumulator/general-purpose registers. They are the smallest store (only tens to a few hundred words) but the fastest, which is exactly why they form the apex of the memory hierarchy below.

Registers are not a separate topic from this lesson — they are the place every memory access ends up. When the CPU reads from RAM, the address travels out through the MAR and the returned value arrives in the MDR; when an instruction is fetched it lands in the CIR; arithmetic results accumulate in the Accumulator or a general-purpose register. So the data path the whole lesson describes is really RAM → cache → registers → ALU and back: each tier exists to keep the registers — and therefore the ALU — supplied without stalling. This is why the registers belong at the top of the hierarchy: they are the fastest because the ALU operates on them directly every cycle, and everything below them (cache, then RAM, then disk) is a progressively larger, slower reservoir feeding them.

Cache Memory

Cache is a small amount of very fast SRAM inside or beside the CPU that holds copies of recently and frequently used data and instructions, so the CPU rarely has to wait for slow DRAM. It sits between the registers and main memory.

Cache Hierarchy

Level	Location	Typical size	Speed	Purpose
L1	Inside each core	32–128 KB per core	Fastest	Hottest data/instructions; usually split L1-I (instructions) + L1-D (data)
L2	Inside/beside each core	256 KB–1 MB per core	Fast	Backs up L1 for data that does not fit
L3	Shared across all cores	4–64 MB	Moderate (still ≫ RAM)	Shared pool; cuts trips to main memory

The pattern is the hierarchy in miniature: smaller and faster closer to the core, larger and slower further out. L1 and L2 are typically per-core (private); L3 is shared so cores can exchange data without going to RAM.

How Cache Works

When the CPU needs a data item or instruction:

Check L1 — if present (cache hit), use it immediately.
Else check L2 — on a hit, copy it up to L1 and use it.
Else check L3 — on a hit, copy it up through the levels.
Else go to main memory (RAM) — a cache miss; fetch from DRAM (and bring a whole block in, exploiting spatial locality). This is far slower.

Term	Definition
Cache hit	Requested item found in cache — fast
Cache miss	Item not in cache — must fetch from slower RAM
Hit rate	Fraction of accesses that hit; higher is better

Why Cache Works — Locality of Reference

Cache pays off because real programs do not access memory randomly; they obey two locality principles:

Principle	Explanation	Example
Temporal locality	A recently used item is likely to be used again soon	A loop counter, a frequently called function
Spatial locality	Items near a used item are likely to be used soon	Successive elements of an array; the next instruction in sequence

Spatial locality is why a miss pulls a whole cache line (block) from RAM, not a single byte — the neighbours are probably needed next.

Locality in Action — a Worked Walkthrough

Consider a loop summing an array, the kind of code that runs constantly:

total = 0
for i in range(1000):
    total = total + data[i]   # touches data[i] and total each pass

Primary Storage

Primary Storage

Spec Mapping

What Is Primary Storage?

RAM — Random Access Memory

DRAM (Dynamic RAM)

SRAM (Static RAM)

DRAM vs SRAM

How RAM Connects to the CPU — Addresses, Buses and Word Size

Why DRAM Refreshes and SRAM Does Not (Mechanism)

ROM — Read-Only Memory

Common Uses of ROM

Registers — the Fastest Tier

Cache Memory

Cache Hierarchy

How Cache Works

Why Cache Works — Locality of Reference

Locality in Action — a Worked Walkthrough

More in Computer Science