You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
This lesson examines the structural blueprints behind every processor you will ever program: how the CPU is wired internally, where instructions and data live, and how a handful of tiny registers orchestrate the entire machine. Master this and the fetch-decode-execute cycle, processor performance, and pipelining all fall into place.
This lesson develops OCR H446 section 1.1.1 (Structure and function of the processor). It establishes the Von Neumann stored-program model, contrasts it with the Harvard and modified Harvard alternatives, sets out the purpose of the core registers (PC, ACC, MAR, MDR, CIR and the status register), and explains the address, data and control buses — including how bus width and word length influence performance. These ideas are the foundation for the fetch-decode-execute cycle (1.1.1), processor performance factors (1.1.1) and pipelining (1.1.1) developed in later lessons, and they connect outward to data representation in module 1.4.
The Von Neumann architecture is named after the mathematician John von Neumann, whose 1945 report describing the design of the EDVAC set out the principle that still dominates general-purpose computing. Its defining idea is the stored-program concept: a program is not hard-wired into the machine but is held as data in the same memory that holds the values it operates on. Instructions and operands are both fetched across a single shared system bus.
This was revolutionary. Earlier machines such as ENIAC were rewired by hand to change their task; the stored-program idea meant a computer could be reprogrammed simply by loading new instructions into memory — exactly the way you load a new application today.
| Component | Role |
|---|---|
| Central Processing Unit (CPU) | Fetches, decodes and executes instructions; contains the control unit, ALU and registers |
| Main Memory (RAM) | Stores both program instructions and data in one single, uniform address space |
| System Bus | Connects the CPU to memory and I/O — comprises the address bus, data bus and control bus |
| Input/Output (I/O) | Peripheral devices connected through I/O controllers |
Because instructions and data live in one address space, a single addressing scheme suffices and memory can be flexibly partitioned between code and data at run time. This flexibility is precisely why operating systems, compilers and self-loading programs are possible: code is just bytes that happen to be interpreted as instructions.
The price of this elegance is the Von Neumann bottleneck. Because instructions and data share a single bus, the CPU cannot fetch an instruction and transfer an operand in the same bus cycle — the two requests must take turns. As processors became far faster than memory, the bus increasingly became the limiting factor: the CPU sits idle, starved of work, while it waits for the next transfer to complete. This idle waiting is sometimes called the memory wall.
Exam Tip: When describing the Von Neumann bottleneck, state clearly that the limitation arises because instructions and data share a single bus — not merely because they share the same memory. Many candidates write only "they share memory" and lose the mark. The bus is the contended resource.
Caching (covered in the primary-storage lesson) is the main mitigation: a small, fast cache near the CPU services most requests, so the slow main-memory bus is used less often.
The Harvard architecture takes the opposite design decision: it uses physically separate memories and buses for instructions and data. The name comes from the Harvard Mark I relay computer, which kept its program on punched tape separate from its data store.
| Component | Role |
|---|---|
| Processor (CPU) | Fetches, decodes and executes instructions |
| Instruction Memory | Dedicated memory for storing program instructions |
| Data Memory | Dedicated memory for storing data |
| Instruction Bus | Dedicated bus between CPU and instruction memory |
| Data Bus | Dedicated bus between CPU and data memory |
Pure Harvard architecture is rare in desktops but common where its determinism pays off:
Most modern desktop, laptop and smartphone CPUs use a modified Harvard architecture — a pragmatic hybrid. At the main-memory level, instructions and data share a single RAM (Von Neumann style, for flexibility). But inside the CPU, the first level of cache is split into a separate instruction cache (L1-I) and data cache (L1-D), Harvard style, so the core can fetch an instruction and access data in the same cycle without contention.
flowchart TB
subgraph CPU
L1I["L1-I Cache<br/>(Instructions)"]
L1D["L1-D Cache<br/>(Data)"]
L2["L2 Cache<br/>(unified)"]
L1I --> L2
L1D --> L2
end
L2 --> RAM["Unified Main Memory<br/>(Von Neumann-style)"]
This gives the speed of Harvard (parallel instruction and data access at the cache level, where it matters most for throughput) together with the flexibility of Von Neumann (one main memory that programs, operating systems and compilers can manage freely). It is the best-of-both-worlds compromise that dominates general-purpose computing today.
| Feature | Von Neumann | Harvard | Modified Harvard |
|---|---|---|---|
| Memory | Shared for instructions and data | Separate memories | Shared main memory, split caches |
| Buses | Single shared bus | Separate instruction and data buses | Shared main bus, split at cache level |
| Bottleneck | Yes | No | Reduced (mitigated by split caches) |
| Cost / complexity | Lower | Higher | Moderate |
| Self-modifying code | Possible | Difficult | Possible (via unified main memory) |
| Typical use | General-purpose PCs (historically) | DSPs, microcontrollers | Modern desktop/laptop/phone CPUs |
A register is a tiny, extremely fast storage location built directly into the CPU. Registers are orders of magnitude faster than main memory because they sit on the processor die and are accessed in a fraction of a clock cycle. The processor contains several special-purpose registers that each play a specific role during instruction processing.
| Register | Full Name | Purpose |
|---|---|---|
| PC | Program Counter | Holds the address of the next instruction to be fetched. Incremented during each fetch; overwritten by jumps and branches |
| MAR | Memory Address Register | Holds the address of the memory location about to be read from or written to. Connected to the address bus |
| MDR | Memory Data Register | Holds the data/instruction just read from memory, or the data about to be written. Connected to the data bus. (Sometimes called the MBR — Memory Buffer Register) |
| CIR | Current Instruction Register | Holds the instruction currently being decoded and executed |
| ACC | Accumulator | A general-purpose register holding the working value and the result of ALU operations |
| Status Register | (Flags / PSW) | Holds individual condition flags set by the ALU — e.g. Zero (Z), Negative (N), Carry (C) and Overflow (V) — used by conditional branch instructions |
The status register (also called the flags register or program status word) deserves special attention because students often forget it. After an arithmetic or logic operation, the ALU sets individual bits according to the result: the Zero flag is set if the result was 0, the Negative flag if the result was negative, the Carry flag if an unsigned operation overflowed the word, and the Overflow flag if a signed operation overflowed. A conditional branch such as "jump if equal" works by testing a flag — for instance, comparing two values subtracts them and a following branch inspects the Zero flag. Without the status register, decision-making in machine code would be impossible.
The registers do not work in isolation. The diagram below shows the datapath — how registers, the Arithmetic Logic Unit (ALU), the Control Unit (CU) and the buses are wired together. The CU decodes instructions and emits the control signals (along the control bus) that gate data through the datapath; the ALU performs arithmetic and logic and updates the status register; the registers stage addresses and data on and off the buses.
flowchart TB
subgraph CPU
CU["Control Unit<br/>(decodes, emits control signals)"]
PC["PC<br/>(next instruction addr)"]
CIR["CIR<br/>(current instruction)"]
MAR["MAR"]
MDR["MDR"]
ACC["ACC"]
ALU["ALU<br/>(arithmetic & logic)"]
SR["Status Register<br/>(Z N C V flags)"]
PC --> MAR
MDR --> CIR
CIR --> CU
MDR --> ALU
ACC --> ALU
ALU --> ACC
ALU --> SR
CU -. control signals .-> ALU
CU -. control signals .-> PC
end
MAR -- "Address Bus (unidirectional)" --> MEM["Main Memory / I/O"]
MEM -- "Data Bus (bidirectional)" --> MDR
MDR -- "Data Bus (write)" --> MEM
CU -- "Control Bus (bidirectional)" --> MEM
A bus is a set of parallel conducting wires that carries data, addresses or control signals between components. The system bus is conventionally divided into three.
| Property | Detail |
|---|---|
| Direction | Unidirectional — from CPU to memory/I/O. The CPU specifies where; it never receives an address back |
| Width | Determines the maximum addressable memory. An n-bit address bus can address 2n distinct locations |
| Purpose | Carries the address of the memory location or I/O port the CPU wants to access |
| Property | Detail |
|---|---|
| Direction | Bidirectional — data travels both from CPU to memory and from memory to CPU |
| Width | Determines how many bits can be transferred in a single operation. Common widths: 8, 16, 32, 64 bits. Often matches the word length of the processor |
| Purpose | Carries the actual data or instructions being transferred |
| Property | Detail |
|---|---|
| Direction | Bidirectional — some signals go from CPU to devices (e.g. memory write), others from devices to CPU (e.g. interrupt request) |
| Key signals | Memory Read, Memory Write, I/O Read, I/O Write, Interrupt Request, Bus Request, Bus Grant, Clock |
| Purpose | Coordinates and synchronises the activities of every component on the bus |
Two related dials affect performance:
Exam Tip: A frequent question asks you to contrast the effect of widening the address bus versus the data bus. Widening the address bus increases the amount of memory that can be addressed; widening the data bus increases the amount of data transferred per operation (throughput). State both effects precisely and do not conflate them — they do different jobs.
Question: A computer has a 24-bit address bus and a 16-bit data bus. Calculate (a) the maximum number of addressable locations and the memory size if each location stores one byte, and (b) the amount of data transferred in one bus operation.
Answer:
(a) Number of locations is given by the address bus width:
locations=224=16,777,216
If each location stores 1 byte, total addressable memory is
16,777,216 bytes=16 MiB.
(b) Data per operation equals the data bus width:
16 bits=2 bytes per transfer.
So this machine can address 16 MiB of byte-addressable memory and move 2 bytes per bus cycle. To move a 32-bit value it would need two transfers — a concrete illustration of how a narrow data bus throttles throughput.
Question: A systems designer wants a processor that can address at least 4 GiB of byte-addressable memory and transfer a 64-bit value in a single operation. What is the minimum width of (a) the address bus and (b) the data bus?
Answer. For (a), 4 GiB is 4×230=232 bytes, so the address bus must be able to select 232 locations. Since an n-bit bus addresses 2n locations, we need
2n≥232⟹n≥32.
A 32-bit address bus is therefore the minimum. (This is exactly why 32-bit operating systems were historically capped at 4 GiB of usable RAM, and why 64-bit machines were needed to go beyond it.) For (b), to move a 64-bit value in one transfer the data bus must be 64 bits wide. Notice how the two requirements are independent: the address-bus width is fixed by the amount of memory, the data-bus width by the size of each transfer. A designer can trade these off separately — a point worth making explicitly in an exam.
It helps to see the registers not as an isolated list but as a relay team. The program counter is the bookmark that never loses its place: it always names where the next instruction lives, and at the end of every fetch it nudges itself forward so the machine marches sequentially through memory unless a branch redirects it. When the processor is ready to fetch, it cannot send the program counter's value to memory directly — memory listens only to the memory address register, so the address is first copied there. The memory address register is the only register wired to the address bus, which is why every memory access, whether for an instruction or for data, must funnel through it.
Once the address is on the address bus and the control unit has raised the read line on the control bus, memory responds by placing the requested word on the data bus. That word lands in the memory data register, the sole register connected to the (bidirectional) data bus. The memory data register is genuinely two-way: on a read it receives from memory, and on a write it holds the value the processor is about to store. This bidirectionality is the practical embodiment of the Von Neumann bottleneck — a single data register on a single data bus can only face one direction at a time, so reads and writes cannot overlap.
If the word just fetched is an instruction, it is copied onward into the current instruction register, where it sits, stable, for the entire decode and execute phases while the control unit picks it apart into opcode and operand. If instead the word is data destined for a calculation, it is routed to the ALU. The accumulator is the ALU's scratchpad: it both supplies one operand and catches the result, which is why a sequence of additions accumulates in it (hence the name). Finally, as the ALU finishes, it does not silently discard information about the result — it sets the status register flags, so that a later conditional branch can ask "was the last result zero?" or "did it overflow?" and steer the program counter accordingly. Seen this way, the registers form a pipeline of custody: address out, data in, instruction held, operands combined, condition recorded — and the whole loop begins again.
Prose explains the idea; register-transfer notation (RTN) pins down the mechanism step by step. RTN uses the arrow ← to mean "the contents of the right-hand side are copied into the left-hand side", with [X] meaning "the contents of memory location X". Square-bracketing memory and naming registers explicitly turns the relay-team story into something you can verify line by line — which is exactly what an examiner rewards.
Suppose location 30 holds the single instruction ADD 64 ("add the contents of address 64 to the accumulator"), location 64 holds the value 7, and the accumulator already holds 5. The program counter currently holds 30. The table below traces every register on every micro-step, so you can see precisely what each register holds and why at each moment.
| Step | Micro-operation (RTN) | PC | MAR | MDR | CIR | ACC | What is happening |
|---|---|---|---|---|---|---|---|
| Start | — | 30 | – | – | – | 5 | PC points at the instruction; ACC holds a partial sum |
| F1 | MAR ← [PC] | 30 | 30 | – | – | 5 | Address of next instruction copied to MAR (drives address bus) |
| F2 | PC ← [PC] + 1 | 31 | 30 | – | – | 5 | PC incremented so it already names the following instruction |
| F3 | MDR ← [memory at MAR] | 31 | 30 | ADD 64 | – | 5 | Memory places the instruction on the data bus into MDR |
| F4 | CIR ← [MDR] | 31 | 30 | ADD 64 | ADD 64 | 5 | Instruction copied to CIR, ready for decoding |
| D1 | decode CIR | 31 | 30 | ADD 64 | ADD 64 | 5 | CU splits CIR into opcode ADD and operand 64 |
| E1 | MAR ← operand (64) | 31 | 64 | ADD 64 | ADD 64 | 5 | Operand address loaded to MAR — note MAR is reused for data |
| E2 | MDR ← [memory at MAR] | 31 | 64 | 7 | ADD 64 | 5 | The data value 7 is fetched into MDR |
| E3 | ACC ← [ACC] + [MDR] | 31 | 64 | 7 | ADD 64 | 12 | ALU adds; result lands in ACC; status flags updated |
Three subtleties become visible only in the trace. First, the PC is incremented during fetch (step F2), not after execute — so even a jump instruction has already moved the PC past itself before the jump overwrites it. Second, the MAR is used twice in a single instruction: once to fetch the instruction (F1) and again to fetch the operand (E1). It is a shared funnel, not a dedicated instruction-address register. Third, the MDR likewise carries first an instruction and then a data value in the same cycle — and on this single shared data bus those two transfers cannot overlap, which is the Von Neumann bottleneck made concrete in two lines of RTN.
Exam Tip: When asked for RTN, always write
MAR ← [PC]thenPC ← [PC] + 1before the memory read. Putting the increment after the read, or omitting the bracket notation for memory contents, are the two most common ways to drop marks on a register-transfer question.
Contrast this with a STORE instruction such as STA 80 ("store the accumulator into address 80"), which reverses the direction of the data bus during execute. The fetch phase is identical (MAR ← [PC], PC ← [PC] + 1, MDR ← [Memory[MAR]], CIR ← [MDR]), but in the execute phase the operand 80 is loaded into the MAR (MAR ← operand), the value to be stored is loaded into the MDR from the accumulator (MDR ← [ACC]), and finally memory is written from the MDR (Memory[MAR] ← [MDR]) while the control unit asserts MEMORY WRITE instead of MEMORY READ. Notice the symmetry: a LOAD/ADD reads memory into the MDR, whereas a STORE writes memory from the MDR. The MDR is the same register in both cases — only the direction of travel on the data bus changes, governed by which control line the control unit raises. This is exactly why the data bus is described as bidirectional while the address bus is unidirectional: addresses only ever leave the CPU, but data must flow both ways depending on the opcode.
A microcontroller used in a real-time motor controller uses a Harvard architecture. A general-purpose laptop uses a modified Harvard architecture.
(a) State the purpose of the Memory Address Register (MAR) and the Memory Data Register (MDR). [2]
(b) Explain one advantage of the Harvard architecture for the real-time motor controller. [2]
(c) Discuss why a general-purpose laptop uses a modified Harvard architecture rather than a pure Von Neumann or pure Harvard design. [6]
(a) Mid-band. The MAR holds the address of a memory location. The MDR holds the data going to or from memory.
(a) Stronger. The MAR holds the address of the memory location the CPU wants to access, and the MDR holds the data or instruction transferred to or from that location. The MAR feeds the address bus and the MDR connects to the data bus, so together they form the interface through which every memory access passes.
(a) Top-band. The MAR holds the address of the memory location the CPU is about to read from or write to, and is connected to the address bus. The MDR holds the data or instruction just read from memory, or the value about to be written, and is connected to the (bidirectional) data bus.
(b) Mid-band. In a Harvard machine the instruction and data buses are separate, so the controller can fetch instructions and read data at the same time, which makes it faster.
(b) Stronger. Because the instruction and data memories have separate buses, the controller can fetch its next instruction while simultaneously reading a sensor value, instead of the two accesses taking turns on one bus. This parallel access avoids the Von Neumann bottleneck, so each instruction takes a fixed, predictable time.
(b) Top-band. In a Harvard machine the instruction and data buses are physically separate, so the controller can fetch its next instruction at the same time as reading a sensor value from data memory. This parallelism removes the Von Neumann bottleneck and gives predictable, deterministic timing — essential for a real-time motor controller that must respond within a fixed deadline.
(c) Mid-band. Von Neumann uses one memory and one bus, which is cheap and flexible but suffers the bottleneck. Harvard uses separate memories and buses, which is faster but more expensive and wastes space. Modified Harvard combines them by using one main memory but splitting the cache, so a laptop gets the speed of Harvard and the flexibility of Von Neumann.
(c) Stronger. A pure Von Neumann laptop would be cheap and easy to program — one memory means the operating system can freely allocate space between code and data — but the shared bus forces instructions and data to contend, stalling the fast CPU. A pure Harvard laptop would remove that contention with separate buses, but the duplicated memory wastes capacity (spare data memory cannot be lent to code) and makes self-modifying code and just-in-time compilation, which real software relies on, difficult. Modified Harvard takes the useful part of each: a single, flexible main memory keeps the Von Neumann programming model, while splitting the L1 cache into instruction and data halves restores Harvard-style parallel access at the level where most accesses actually occur. The laptop therefore gets most of Harvard's speed without losing Von Neumann's flexibility, which is why it is the chosen design.
(c) Top-band. A pure Von Neumann design is simple and flexible — one memory, one bus, easy to manage with an operating system and compilers — but suffers the Von Neumann bottleneck: instructions and data contend for a single bus, so the fast CPU stalls waiting for memory. A pure Harvard design removes that contention with separate instruction and data memories, but it is more expensive (duplicated buses and memory), wastes capacity because spare data memory cannot be lent to instructions, and makes self-modifying code, JIT compilation and conventional operating systems awkward — all of which a general-purpose laptop relies on. The modified Harvard compromise resolves the tension: it keeps a single, flexible main memory (Von Neumann) so the OS and applications can manage code and data freely, while splitting the L1 cache into separate instruction and data caches (Harvard) so the core can fetch and load in parallel where it matters most for throughput. Because the overwhelming majority of accesses hit the fast split caches, the laptop gains most of Harvard's parallelism without sacrificing Von Neumann's flexibility — which is why modified Harvard, not either pure form, dominates general-purpose machines.
Examiner-style commentary: Part (a) is pure recall but candidates frequently swap the two registers — the strongest answers anchor each to its bus. Part (b) must apply to the scenario; a generic "it is faster" would not reach top band, whereas linking parallel access to deterministic real-time deadlines does. Part (c) is an AO3 discussion: top-band responses weigh all three designs, identify the specific weakness each suffers, and conclude with a justified reason for the chosen architecture rather than merely listing pros and cons.