You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Compression reduces the file size of data so that it takes up less storage space and can be transmitted more quickly. OCR J277 Section 2.6 requires you to understand why compression is needed and the difference between lossy and lossless compression, including specific techniques such as Huffman coding and run-length encoding (RLE).
| Benefit | Explanation |
|---|---|
| Reduced storage | Compressed files take up less space on a hard drive or SSD |
| Faster transfer | Smaller files can be sent over a network more quickly |
| Lower bandwidth | Less network capacity is needed to transmit compressed data |
| Cost savings | Less storage and bandwidth reduces costs |
Lossy compression permanently removes some data from the file to achieve a smaller size. The removed data cannot be recovered — the original file cannot be perfectly reconstructed.
Lossy compression removes data that is considered least important or least noticeable to human perception:
| Advantage | Disadvantage |
|---|---|
| Much smaller file sizes | Some quality is lost permanently |
| Good for media (photos, music, video) | Cannot recover the original data |
| Adjustable quality/size trade-off | Not suitable for text, code, or data where accuracy is critical |
| Format | Type | Used for |
|---|---|---|
| JPEG | Image | Photographs |
| MP3 | Audio | Music |
| MP4 / H.264 | Video | Video streaming |
| AAC | Audio | iTunes, streaming |
OCR Exam Tip: Lossy compression is suitable for media files where small quality reductions are acceptable. It is NOT suitable for text documents, program files, or medical/scientific data where every bit matters.
Lossless compression reduces file size without losing any data. The original file can be perfectly reconstructed from the compressed version.
| Advantage | Disadvantage |
|---|---|
| No data is lost — perfect reconstruction | File sizes are not reduced as much as lossy |
| Suitable for all file types | Still larger than lossy compressed files |
| Essential for text, code, and important data | Compression and decompression may be slower |
| Format | Type | Used for |
|---|---|---|
| PNG | Image | Graphics, screenshots |
| FLAC | Audio | High-quality music |
| ZIP | Archive | Any files |
| GIF | Image | Simple animations |
| Feature | Lossy | Lossless |
|---|---|---|
| Data lost? | Yes | No |
| Original recoverable? | No | Yes |
| File size reduction | Large (up to 90%+) | Moderate (20-60%) |
| Best for | Photos, music, video | Text, code, important data |
| Examples | JPEG, MP3, MP4 | PNG, ZIP, FLAC |
flowchart TD
A[Original File] --> B{Can data be lost?}
B -->|No - text, code, medical| C[Lossless]
B -->|Yes - photo, music, video| D[Lossy]
C --> E[RLE]
C --> F[Huffman Coding]
C --> G[PNG / ZIP / FLAC]
D --> H[JPEG removes colour detail]
D --> I[MP3 removes frequencies]
D --> J[MP4 removes inter-frame data]
E --> K[Smaller file, perfect reconstruction]
F --> K
G --> K
H --> L[Much smaller file, some quality lost]
I --> L
J --> L
RLE compresses data by replacing consecutive repeated values with a count and the value.
Example:
Original data: AAAAABBBCCDDDDDDDD
RLE compressed: 5A 3B 2C 8D
| Original | Compressed | Savings |
|---|---|---|
| 18 characters | 8 characters | 10 characters saved |
RLE works well when there are long runs of repeated data (e.g., areas of solid colour in simple images). It works poorly on data with little repetition.
Binary example:
Original: 11111 000 1111 00000000
RLE: 5 3 4 8 (with an agreed starting bit, e.g., starting with 1)
Huffman coding assigns shorter binary codes to characters that appear more frequently and longer codes to characters that appear less frequently.
Example:
In normal ASCII, every character uses 8 bits. In the sentence "AAABBC":
Huffman coding might assign:
| Encoding | Calculation | Total bits |
|---|---|---|
| ASCII (8 bits each) | 6 x 8 | 48 bits |
| Huffman | (3x1) + (2x2) + (1x2) | 9 bits |
Savings: 48 - 9 = 39 bits saved!
OCR Exam Tip: When explaining Huffman coding, state that frequently occurring characters are given shorter codes and less frequent characters are given longer codes. You should be able to read a Huffman tree and decode a binary string using it.
A Huffman tree is a binary tree used to generate Huffman codes:
To decode a Huffman-encoded message, start at the root and follow the branches (0 = left, 1 = right) until you reach a leaf node.
Imagine a single row of pixels in a simple flag image. Each pixel is stored as a single letter to keep the example compact: R = red, W = white, B = blue.
Original row (30 characters): RRRRRWWWWWWWWWWWWWWWBBBBBBBBBB
Step 1: Apply RLE.
Walk along the row and count consecutive runs:
RLE output: 5R 15W 10B. Written as a compact stream (count then value) the encoded data is nine characters (5, R, 1, 5, W, 1, 0, B, plus separators), a roughly 70% saving. RLE works brilliantly here because the data has long runs of the same symbol.
Step 2: Apply Huffman coding.
Count frequencies across the row: W = 15, B = 10, R = 5. Build a Huffman tree by repeatedly combining the two lowest-frequency nodes:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.