Compression: Lossy and Lossless Techniques

Compression reduces the file size of data so that it takes up less storage space and can be transmitted more quickly. OCR J277 Section 2.6 requires you to understand why compression is needed and the difference between lossy and lossless compression, including specific techniques such as Huffman coding and run-length encoding (RLE).

Why Compress Data?

Benefit	Explanation
Reduced storage	Compressed files take up less space on a hard drive or SSD
Faster transfer	Smaller files can be sent over a network more quickly
Lower bandwidth	Less network capacity is needed to transmit compressed data
Cost savings	Less storage and bandwidth reduces costs

Lossy Compression

Lossy compression permanently removes some data from the file to achieve a smaller size. The removed data cannot be recovered — the original file cannot be perfectly reconstructed.

How It Works

Lossy compression removes data that is considered least important or least noticeable to human perception:

In images (JPEG): removes fine colour details that the human eye cannot easily distinguish
In audio (MP3): removes frequencies that the human ear cannot easily hear (very high/low frequencies)
In video (MP4): removes details between similar frames

Advantages and Disadvantages

Advantage	Disadvantage
Much smaller file sizes	Some quality is lost permanently
Good for media (photos, music, video)	Cannot recover the original data
Adjustable quality/size trade-off	Not suitable for text, code, or data where accuracy is critical

Examples of Lossy Formats

Format	Type	Used for
JPEG	Image	Photographs
MP3	Audio	Music
MP4 / H.264	Video	Video streaming
AAC	Audio	iTunes, streaming

OCR Exam Tip: Lossy compression is suitable for media files where small quality reductions are acceptable. It is NOT suitable for text documents, program files, or medical/scientific data where every bit matters.

Lossless Compression

Lossless compression reduces file size without losing any data. The original file can be perfectly reconstructed from the compressed version.

Advantages and Disadvantages

Advantage	Disadvantage
No data is lost — perfect reconstruction	File sizes are not reduced as much as lossy
Suitable for all file types	Still larger than lossy compressed files
Essential for text, code, and important data	Compression and decompression may be slower

Examples of Lossless Formats

Format	Type	Used for
PNG	Image	Graphics, screenshots
FLAC	Audio	High-quality music
ZIP	Archive	Any files
GIF	Image	Simple animations

Comparison Table

Feature	Lossy	Lossless
Data lost?	Yes	No
Original recoverable?	No	Yes
File size reduction	Large (up to 90%+)	Moderate (20-60%)
Best for	Photos, music, video	Text, code, important data
Examples	JPEG, MP3, MP4	PNG, ZIP, FLAC

flowchart TD
    A[Original File] --> B{Can data be lost?}
    B -->|No - text, code, medical| C[Lossless]
    B -->|Yes - photo, music, video| D[Lossy]
    C --> E[RLE]
    C --> F[Huffman Coding]
    C --> G[PNG / ZIP / FLAC]
    D --> H[JPEG removes colour detail]
    D --> I[MP3 removes frequencies]
    D --> J[MP4 removes inter-frame data]
    E --> K[Smaller file, perfect reconstruction]
    F --> K
    G --> K
    H --> L[Much smaller file, some quality lost]
    I --> L
    J --> L

Lossless Technique 1: Run-Length Encoding (RLE)

RLE compresses data by replacing consecutive repeated values with a count and the value.

Example:

Original data: AAAAABBBCCDDDDDDDD

RLE compressed: 5A 3B 2C 8D

Original	Compressed	Savings
18 characters	8 characters	10 characters saved

RLE works well when there are long runs of repeated data (e.g., areas of solid colour in simple images). It works poorly on data with little repetition.

Binary example:

Original: 11111 000 1111 00000000

RLE: 5 3 4 8 (with an agreed starting bit, e.g., starting with 1)

Lossless Technique 2: Huffman Coding

Huffman coding assigns shorter binary codes to characters that appear more frequently and longer codes to characters that appear less frequently.

Example:

In normal ASCII, every character uses 8 bits. In the sentence "AAABBC":

A appears 3 times
B appears 2 times
C appears 1 time

Huffman coding might assign:

A = 0 (1 bit) — most frequent
B = 10 (2 bits) — second most frequent
C = 11 (2 bits) — least frequent

Encoding	Calculation	Total bits
ASCII (8 bits each)	6 x 8	48 bits
Huffman	(3x1) + (2x2) + (1x2)	9 bits

Savings: 48 - 9 = 39 bits saved!

OCR Exam Tip: When explaining Huffman coding, state that frequently occurring characters are given shorter codes and less frequent characters are given longer codes. You should be able to read a Huffman tree and decode a binary string using it.

Huffman Trees

A Huffman tree is a binary tree used to generate Huffman codes:

Left branches represent 0
Right branches represent 1
Leaf nodes contain the characters

To decode a Huffman-encoded message, start at the root and follow the branches (0 = left, 1 = right) until you reach a leaf node.

Worked Example: Compressing an Image Row with RLE, Then Comparing with Huffman

Imagine a single row of pixels in a simple flag image. Each pixel is stored as a single letter to keep the example compact: R = red, W = white, B = blue.

Original row (30 characters): RRRRRWWWWWWWWWWWWWWWBBBBBBBBBB

Step 1: Apply RLE.

Walk along the row and count consecutive runs:

5 reds (R)
15 whites (W)
10 blues (B)

RLE output: 5R 15W 10B. Written as a compact stream (count then value) the encoded data is nine characters (5, R, 1, 5, W, 1, 0, B, plus separators), a roughly 70% saving. RLE works brilliantly here because the data has long runs of the same symbol.

Step 2: Apply Huffman coding.

Count frequencies across the row: W = 15, B = 10, R = 5. Build a Huffman tree by repeatedly combining the two lowest-frequency nodes:

Combine R (5) and B (10) into a node with frequency 15.
Combine that node with W (15) to form the root (frequency 30).

Compression: Lossy and Lossless Techniques

Compression: Lossy and Lossless Techniques

Why Compress Data?

Lossy Compression

How It Works

Advantages and Disadvantages

Examples of Lossy Formats

Lossless Compression

Advantages and Disadvantages

Examples of Lossless Formats

Comparison Table

Lossless Technique 1: Run-Length Encoding (RLE)

Lossless Technique 2: Huffman Coding

Huffman Trees

Worked Example: Compressing an Image Row with RLE, Then Comparing with Huffman

More in Computer Science