You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Every general-purpose computer you have ever used rests on a single, deceptively simple idea: that a machine can store its own program in the same way it stores its data, and then execute that program one instruction at a time. This is the stored-program concept, and the architecture built around it — the Von Neumann architecture — has dominated computing since the 1940s. Its principal rival, the Harvard architecture, keeps instructions and data physically apart and trades flexibility for raw memory throughput.
At A-Level you are expected to do far more than recite a definition. You must be able to explain why the stored-program concept was revolutionary, trace exactly where the famous Von Neumann bottleneck comes from at the level of buses and memory access, compare the two architectures with precision, and explain why the processor in your laptop is in fact a Modified Harvard hybrid rather than a textbook example of either pure form. This lesson builds that depth carefully, with block diagrams, comparison tables, a worked memory-access trace and a full specimen question.
This lesson addresses the AQA A-Level Computer Science (7517) specification under §4.7 Fundamentals of computer organisation and architecture, specifically:
It connects forwards to §4.7.3 Structure and role of the processor and its components (registers, buses, the FDE cycle) and backwards to §4.5 Fundamentals of data representation, since everything stored in either memory — instruction or datum — is ultimately a binary pattern.
Before the stored-program concept, early machines such as ENIAC were programmed by physically rewiring plugboards and setting switches. Changing the task could take days. The breakthrough, set out in John von Neumann's 1945 First Draft of a Report on the EDVAC, was to recognise that a program is just a sequence of coded instructions, and that those codes are themselves data that can be stored in memory.
This single insight has three profound consequences:
01101000 could be an instruction opcode or the number 104 — the meaning depends entirely on how the processor uses it, not on where it is stored.Exam Tip: A favourite short-answer question is "State the stored program concept." A full-mark answer names both halves: instructions and data are stored together in main memory, and instructions are fetched and executed sequentially. Quote both clauses.
Because instructions and data are stored identically as binary, the same byte can mean different things depending on context. Suppose address 60 holds the byte 00000101. Consider three scenarios:
| Scenario | How the processor uses address 60 | Interpretation of 00000101 |
|---|---|---|
| The PC points to address 60 during Fetch | Fetched into the CIR as an instruction | An opcode (say, "HALT" or whatever opcode 5 maps to) |
A LDA 60 instruction runs | Fetched into the ACC as data | The unsigned integer 5 |
| A character-print routine reads address 60 | Treated as a character code | Whatever glyph code 5 represents in the encoding |
The bit pattern never changes — only the intent of the access does. This is the deep meaning of the stored-program concept, and it is what makes a single machine able to run any program: a program is simply data that the processor has been told to execute. It is also why a corrupted jump can cause a crash: if the PC is loaded with the address of a region that actually holds data, the processor will dutifully try to "execute" those data bytes as instructions, usually with chaotic results.
The stored-program idea is so familiar today that its significance is easy to underestimate. The progression was roughly:
For exam purposes you need the principle and its consequences, not a memorised timeline — but understanding that programming once meant rewiring makes it obvious why storing the program as changeable data was transformative.
The Von Neumann architecture implements the stored-program concept with a single main memory holding both instructions and data, connected to the processor by a single shared set of buses.
flowchart LR
subgraph CPU["Processor (CPU)"]
CU["Control Unit"]
ALU["ALU"]
REG["Registers"]
end
CPU <-->|"shared system bus<br/>(address + data + control)"| MEM["Main Memory<br/>(instructions AND data)"]
CPU <--> IO["Input / Output"]
| Component | Role |
|---|---|
| Processor (CPU) | Fetches, decodes and executes instructions; contains the control unit, ALU and registers |
| Main memory (RAM) | Stores both program instructions and data in one address space |
| System bus | A single shared set of buses (address, data, control) connecting CPU and memory |
| Input / Output | Peripheral devices connected through I/O controllers |
Because there is only one bus between the CPU and memory, the processor cannot fetch an instruction and transfer data in the same clock cycle. Consider what happens when the CPU executes LDA 200 (load the contents of address 200):
| Step | Bus activity |
|---|---|
| Fetch the instruction | Bus busy carrying the instruction from memory to CPU |
| Fetch the operand (data at address 200) | Bus busy again, now carrying the data — the fetch could not happen until the instruction transfer finished |
Modern processors are far faster than main memory, so the CPU frequently sits idle waiting for the bus to free up. This mismatch between fast processor and comparatively slow single memory channel is the Von Neumann bottleneck.
A rough illustration makes the problem vivid. Imagine a processor that can perform a useful operation in 1 nanosecond, but every instruction and every data value must travel over a single bus that takes 4 nanoseconds per transfer. A simple LDA 200; ADD 201; STA 202 sequence needs:
LDA), 1 data read (for ADD), 1 data write (for STA).That is 6 bus transfers at 4 ns each = 24 ns of bus time, during which the actual computation occupied only a couple of nanoseconds. The processor spends the overwhelming majority of its time waiting for the single bus, not computing. No matter how much faster you make the processor core, the single shared bus caps throughput — that ceiling is the Von Neumann bottleneck. (These figures are illustrative, not real device specifications.)
Exam Tip: Be precise about the cause. The bottleneck arises because instructions and data share a single bus, so they cannot be transferred simultaneously — not merely because they share the same memory. State both the shared-bus cause and the idle-CPU consequence for full marks.
The Harvard architecture uses physically separate memories and separate buses for instructions and data. It takes its name from the Harvard Mark I relay computer, whose instructions and data were stored on entirely different media.
flowchart LR
IMEM["Instruction Memory"] <-->|"instruction bus"| CPU["Processor (CPU)"]
CPU <-->|"data bus"| DMEM["Data Memory"]
| Component | Role |
|---|---|
| Processor (CPU) | Fetches, decodes and executes instructions |
| Instruction memory | Stores program instructions only |
| Data memory | Stores data only |
| Instruction bus | Dedicated bus connecting CPU to instruction memory |
| Data bus | Dedicated bus connecting CPU to data memory |
Most modern desktop, laptop and smartphone processors are neither purely Von Neumann nor purely Harvard — they use a Modified Harvard architecture. At the level of main memory the design is Von Neumann (one unified RAM holds both programs and data), but inside the CPU there are separate Level 1 (L1) caches — one for instructions (L1-I) and one for data (L1-D).
flowchart TB
subgraph CPU["CPU Chip"]
CORE["Core (CU + ALU + registers)"]
L1I["L1-I Cache<br/>(instructions)"]
L1D["L1-D Cache<br/>(data)"]
L2["Unified L2 Cache"]
CORE --> L1I
CORE --> L1D
L1I --> L2
L1D --> L2
end
L2 <-->|"unified system bus"| RAM["Unified Main Memory<br/>(Von Neumann-style)"]
Because the two L1 caches have independent ports, the core can fetch an instruction from L1-I in the very same cycle that it reads an operand from L1-D — giving the simultaneous-access speed of Harvard for the data the CPU is actually working on. Yet the single backing RAM preserves the flexibility of Von Neumann: programs can be loaded as ordinary data and the OS can manage one pool of memory. This hybrid is why the bottleneck is mitigated rather than eliminated on a typical PC.
Two empirical facts about real programs explain why splitting only the L1 cache is enough to recover most of Harvard's benefit:
By the time accesses reach the unified L2/L3 cache and main memory, they are infrequent enough that a single shared path no longer dominates performance. This is the engineering insight behind Modified Harvard: split where the traffic is heaviest, unify where flexibility matters most.
| Feature | Von Neumann | Harvard | Modified Harvard |
|---|---|---|---|
| Memory for instructions and data | Shared (one memory) | Separate (two memories) | Shared main memory, split L1 caches |
| Buses | Single shared bus | Separate instruction and data buses | Shared main bus, split at cache level |
| Von Neumann bottleneck | Yes | No | Reduced (cache absorbs most traffic) |
| Cost / hardware complexity | Lower | Higher | Moderate |
| Bus-width flexibility | Same width for both | Independently sized | Independent at cache, unified to RAM |
| Self-modifying code | Possible | Not directly possible | Possible (via main memory) |
| Typical use | Classic general-purpose model | DSPs, microcontrollers, embedded | Modern desktop / laptop / phone CPUs |
The performance difference between the two pure architectures can be expressed in terms of memory bandwidth — the number of memory transfers possible per unit time. Consider an idealised processor whose memory can complete one transfer per clock cycle on each available bus:
| Architecture | Buses to memory | Transfers possible per cycle | Effect on a load-heavy loop |
|---|---|---|---|
| Von Neumann | 1 (shared) | 1 (either an instruction or a datum) | Instruction fetch and data access must take separate cycles |
| Harvard | 2 (separate) | 2 (one instruction and one datum) | Instruction fetch and data access happen in the same cycle |
For a tight loop that reads one data value per instruction, the Harvard machine can in principle sustain twice the instruction rate of the Von Neumann machine, because it never has to choose between fetching the next instruction and reading the data. This doubling is the theoretical best case; real gains are smaller because not every instruction touches memory, but it explains precisely why Harvard suits streaming workloads. Note that Harvard does not make the ALU or the clock any faster — its advantage is purely in memory bandwidth, which is exactly why a fixed-function streaming device benefits but a branch-heavy general-purpose program may see little improvement.
A useful sanity check: at the level of the assembly programmer, the same program — LDA, ADD, STA instructions operating on addresses — can run on either architecture. The architecture is a property of the hardware organisation, not of the instruction set the programmer writes. What changes is how the hardware satisfies those instructions: on Von Neumann every fetch and every data access queue for one bus; on Harvard the instruction fetches use one bus while the data accesses use another. This is why you can compile broadly the same C program for a Von Neumann PC and a Harvard microcontroller — the difference is felt in performance and in capabilities like self-modifying code, not in the basic shape of the program.
When a question says "compare the Von Neumann and Harvard architectures", examiners award marks for paired, contrasting statements rather than two separate descriptions. Build each point as a single contrast:
Closing with the observation that modern PCs are a Modified Harvard hybrid (split L1 caches, unified main memory) demonstrates the synoptic understanding that lifts the answer into the top band.
Exam Tip: When asked to "state an application of the Harvard architecture", always give a concrete example — "digital signal processors in audio equipment" or "PIC microcontrollers in embedded control systems". Vague answers such as "in small devices" will not earn the mark.
When an exam scenario asks you to choose an architecture, work through these questions in order — they map cleanly onto the trade-offs above:
A general-purpose computer answers "many programs / flexibility first", so it lands on Von Neumann at the main-memory level with a Modified Harvard cache for speed. A dedicated signal processor answers "one program / bandwidth and timing first", so it lands on Harvard. Articulating this reasoning — rather than asserting "Harvard is faster" — is what earns the evaluation marks.
PC → MAR and pulls the instruction back into the MDR. The bottleneck is a property of that very bus.One subtle but examinable consequence of the two architectures concerns whether a program can treat its own instructions as data.
On a Von Neumann machine, because code and data share one memory, a program can write to the very region holding its instructions — self-modifying code. Historically this was used as a clever optimisation (e.g. patching a frequently executed instruction at run time). However, it is also a serious security weakness: if an attacker can trick a program into writing malicious bytes into an executable region and then jump there, those bytes will be executed as instructions. This is the essence of classic code-injection attacks.
On a pure Harvard machine, instruction memory is physically separate and often read-only, so code injection of this kind is structurally impossible — a genuine security advantage of the architecture.
Modern Von Neumann-based PCs claw back some of this safety in software/hardware: the NX (No-eXecute) bit, also called the XD bit, marks pages of memory as "data only", so the processor refuses to execute instructions fetched from them. This effectively imposes a logical Harvard-style code/data separation on top of a physically Von Neumann machine — a neat example of architecture and security interacting, and exactly the kind of cross-topic link that lifts an answer into the top band.
A manufacturer is designing a dedicated digital audio effects unit that must continuously process a stream of sound samples in real time. An engineer proposes a Harvard architecture rather than a Von Neumann architecture.
(a) Explain what is meant by the stored program concept. (2 marks) (b) Explain the cause and effect of the Von Neumann bottleneck. (3 marks) (c) Discuss whether the Harvard architecture is an appropriate choice for this audio effects unit, comparing it with the Von Neumann architecture. (6 marks)
AO breakdown:
The stored program concept means the program and data are both kept in memory. The Von Neumann bottleneck happens because the CPU and memory are connected, so the CPU sometimes has to wait. A Harvard architecture has separate memories for instructions and data, so it is faster because it can do two things at once. This is good for the audio unit because it needs to be fast and process the sound quickly, so Harvard is a good choice. The downside is that Harvard costs more money to build.
The stored program concept means that instructions and data are both stored together in main memory and the instructions are fetched and executed one after another. The Von Neumann bottleneck occurs because instructions and data share a single bus, so the processor cannot fetch an instruction and transfer data at the same time; while it waits for the bus the CPU is idle, slowing the system down.
The Harvard architecture uses separate instruction and data memories with separate buses, so the processor can fetch the next instruction and read a data sample at the same time. For the audio unit this is useful because it has to process a continuous stream of samples in real time, so the extra memory bandwidth helps it keep up. A disadvantage is that Harvard hardware is more expensive and less flexible, but the unit only runs one fixed program so flexibility does not matter much.
The stored program concept is the principle that machine-code instructions and the data they operate on are both held in main memory in the same address space, and that the processor fetches and executes those instructions sequentially. This makes programs easy to load and change, and allows programs (such as compilers) to manipulate other programs as data.
The Von Neumann bottleneck arises specifically because instructions and data travel over a single shared bus. The bus can only carry one transfer at a time, so an instruction fetch and a subsequent data fetch must be serialised. Because modern processors are much faster than main memory, the CPU is frequently left idle waiting for the bus, capping effective throughput regardless of clock speed.
For this audio effects unit the Harvard architecture is a strong fit. The workload is a continuous real-time stream of samples passing through a fixed processing program, so the dominant requirement is sustained, predictable memory bandwidth rather than the ability to load arbitrary new programs. Harvard's separate buses let the device fetch the next instruction while simultaneously reading the next sample, removing the bottleneck and supporting deterministic real-time timing. Its disadvantages — higher cost, stranded memory capacity and poor support for self-modifying or dynamically loaded code — carry little weight here because the unit is a fixed-function embedded device that runs one program for its whole life. A Von Neumann design would be cheaper and more flexible but its single bus risks dropping samples under load. On balance, for a dedicated real-time signal-processing device, Harvard (or a modified Harvard with split caches) is the better-justified choice; if the same chip also had to run a general-purpose operating system, the verdict would reverse in favour of Von Neumann.
Examiner-style commentary: The mid-band response gives only half the stored-program definition (it omits sequential execution), attributes the bottleneck vaguely to "the CPU and memory being connected" rather than to the single bus, and asserts "Harvard is faster" without a real-time-specific justification — so part (c) reads as an unbalanced one-sided claim. The stronger response earns the definition and bottleneck marks cleanly and applies Harvard to the streaming scenario, but its evaluation is thin. The top-band response is precise about the single-bus cause, ties every architectural property explicitly to the fixed-function real-time requirements of the device, weighs genuine disadvantages, and closes with a conditional judgement (the answer would change for a general-purpose machine) — exactly the discriminating evaluation AO3 rewards.
This content is aligned with the AQA A-Level Computer Science (7517) specification.