You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
This lesson covers data compression — reducing the size of files to save storage space and reduce transmission time. You need to understand lossy and lossless compression, and specific algorithms: Run-Length Encoding (RLE), Huffman coding, and dictionary encoding (LZW).
| Benefit | Explanation |
|---|---|
| Reduced storage | Smaller files use less disk space |
| Faster transmission | Smaller files transfer more quickly over networks |
| Lower bandwidth usage | Less data to send = less network capacity needed |
| Cost savings | Less storage and bandwidth = lower costs |
| Feature | Lossy | Lossless |
|---|---|---|
| Data loss | Some data is permanently removed | No data is lost |
| File size | Typically much smaller | Smaller than original, but larger than lossy |
| Quality | Reduced (may be imperceptible) | Identical to original |
| Reversible | No — original cannot be recovered | Yes — original can be perfectly reconstructed |
| Examples | JPEG, MP3, AAC, H.264 | PNG, FLAC, ZIP, GIF |
| Best for | Images, audio, video where small quality loss is acceptable | Text, programs, medical images, archives |
RLE is a lossless compression algorithm that replaces consecutive repeated values with a count and the value.
Original: AAAAAABBCCCCDDDDDDDD
Encoded: 6A2B4C8D
Each run of identical characters is stored as (count, character).
Consider a row of pixels in a 1-bit black-and-white image:
Original: 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0
Encoded: 5,0 3,1 4,0 6,1 2,0
Original: 20 values. Encoded: 10 values (5 pairs). 50% reduction.
| Scenario | Effectiveness |
|---|---|
| Long runs of repeated data | Very effective (e.g., simple graphics, fax documents) |
| Data with few repetitions | Ineffective — encoded data may be larger than original |
| Photographic images | Generally poor — pixel values vary frequently |
Huffman coding is a lossless compression algorithm that assigns shorter binary codes to more frequently occurring characters and longer codes to less frequent characters.
In standard ASCII, every character uses the same number of bits (7 or 8). Huffman coding uses variable-length codes so that common characters use fewer bits.
Message: "ABRACADABRA" (11 characters)
Step 1: Frequency count
| Character | Frequency |
|---|---|
| A | 5 |
| B | 2 |
| R | 2 |
| C | 1 |
| D | 1 |
Step 2: Build the tree
Combine the two lowest frequencies:
Actually, let us build it properly, always picking the two smallest:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.