Parallel Processing and Multi-core Systems

This lesson explores how modern computers achieve high performance not just by making a single processor faster, but by performing multiple operations simultaneously. You need to understand different forms of parallelism, multi-core architectures, and their limitations.

Why Parallelism?

There are physical limits to how fast a single processor core can run:

Increasing clock speed generates more heat, requiring more power and cooling.
Signals cannot travel faster than the speed of light — shrinking transistors helps but has limits.
The power wall — beyond a certain point, increasing clock frequency leads to diminishing returns because power consumption rises disproportionately.

The solution: instead of making one core faster, use multiple cores or processing units working in parallel.

Types of Parallel Processing

1. Instruction-Level Parallelism (ILP)

ILP exploits parallelism within a single instruction stream. The processor identifies instructions that do not depend on each other and executes them simultaneously.

Pipelining (covered in the previous lesson) is a form of ILP.
Superscalar execution: The processor has multiple execution units (multiple ALUs, multiple load/store units) and can issue more than one instruction per clock cycle.
Out-of-order execution: Instructions are dynamically reordered at runtime so that independent instructions can execute as soon as their operands are ready, rather than waiting for earlier instructions to complete.

2. Data-Level Parallelism (DLP)

DLP applies the same operation to multiple data items simultaneously.

SIMD (Single Instruction, Multiple Data): One instruction operates on multiple data values at once. For example, adding four pairs of numbers in a single instruction using 128-bit registers.
Used extensively in graphics processing, audio processing, and scientific computing.
Instruction set extensions like SSE, AVX (x86) and NEON (ARM) provide SIMD capabilities.

3. Task-Level Parallelism (TLP)

TLP runs different threads or processes on different cores simultaneously.

Each core executes an independent thread.
The operating system's scheduler allocates threads to cores.
This is the primary benefit of multi-core processors.

Multi-Core Processors

A multi-core processor has two or more independent processing cores on a single chip, each capable of executing its own instruction stream.

Architecture

┌──────────────────────────────────────┐
│              CPU Chip                 │
│  ┌──────┐  ┌──────┐  ┌──────┐       │
│  │Core 0│  │Core 1│  │Core 2│  ...  │
│  │L1-I  │  │L1-I  │  │L1-I  │       │
│  │L1-D  │  │L1-D  │  │L1-D  │       │
│  └──┬───┘  └──┬───┘  └──┬───┘       │
│     └────┬────┘────┬────┘            │
│       Shared L2 / L3 Cache           │
└──────────────┬───────────────────────┘
               │
         Main Memory (RAM)

Each core typically has its own L1 cache (split into instruction and data) and shares an L2 or L3 cache with the other cores.

Advantages

Higher throughput — multiple threads or processes run simultaneously.
Better multitasking — the OS can assign different applications to different cores.
Lower power per unit of performance — two cores at a moderate clock speed use less power than one core at twice the clock speed.

Parallel Processing and Multi-core Systems

Parallel Processing and Multi-core Systems

Why Parallelism?

Types of Parallel Processing

1. Instruction-Level Parallelism (ILP)

2. Data-Level Parallelism (DLP)

3. Task-Level Parallelism (TLP)

Multi-Core Processors

Architecture

Advantages

Limitations

More in Computer Science