You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Data sent over a network or read from storage can be corrupted by noise, so systems add redundant information that lets the receiver check the data arrived intact. This lesson works through the main schemes — parity bits, majority voting, checksums, and check digits (with ISBN-13 and the Luhn algorithm) — each with a worked example, and ends with a brief note on the step beyond detection: error correction.
This lesson addresses the H446 1.3.1 area on data integrity:
(This is a paraphrase of the specification content, not a verbatim quotation.)
When data is transmitted or stored, physical effects can flip bits (0 to 1 or 1 to 0):
| Cause | Description |
|---|---|
| Electrical interference | Electromagnetic noise from nearby equipment |
| Signal attenuation | The signal weakens over distance until bits are misread |
| Hardware faults | Faulty cables, connectors or failing storage media |
| Crosstalk | Signals leaking between adjacent wires |
| Cosmic rays | High-energy particles flipping a memory bit (rare but real) |
Because a single flipped bit can corrupt a value, an instruction or a file, systems add redundancy — extra bits computed from the data — so that corruption can be spotted, and sometimes repaired, at the receiving end. Detection tells you something is wrong; correction additionally fixes it.
The key idea throughout this lesson is that an error is simply a flipped bit (a 0 read as a 1, or vice versa), and that the defence is always redundancy: information beyond the raw data that lets the receiver tell whether the data is self-consistent. The schemes differ only in how much redundancy they add and therefore what they can do with it — from one parity bit that merely notices a single flip, through a checksum that guards a whole block, to a correcting code carrying enough structure to locate and repair an error. More redundancy buys more protection but costs more storage or bandwidth, so the engineering question is always "what is the cheapest scheme that meets the reliability this channel actually needs?"
A parity bit is one extra bit added to a group of data bits to force the total number of 1s to be either even or odd, by prior agreement.
The parity bit is chosen so the total count of 1s (data plus parity) is even.
| Data bits | Number of 1s | Parity bit | Total 1s |
|---|---|---|---|
| 1010110 | 4 (even) | 0 | 4 |
| 1010111 | 5 (odd) | 1 | 6 |
| 0000000 | 0 (even) | 0 | 0 |
| 1111111 | 7 (odd) | 1 | 8 |
The parity bit is chosen so the total count of 1s is odd.
| Data bits | Number of 1s | Parity bit | Total 1s |
|---|---|---|---|
| 1010110 | 4 (even) | 1 | 5 |
| 1010111 | 5 (odd) | 0 | 5 |
To set a parity bit, count the 1s in the data and add the bit that hits the agreed total. For the data 1011001: it contains four 1s, which is already even, so for even parity the parity bit is 0 (transmitted as 1011001 0), whereas for odd parity the bit must be 1 to make five 1s (transmitted as 1011001 1). For the data 1110001: four 1s again, so even parity ⇒ 0, odd parity ⇒ 1. The rule is mechanical — count the 1s, then choose the bit that reaches the agreed even/odd total — and getting it backwards (e.g. setting the bit to match the data's own parity rather than to achieve the agreed one) is a frequent slip.
Sender and receiver agree on even parity. The sender transmits the byte 1010110 with parity bit 0 (four 1s, already even). Suppose noise flips one bit so the receiver gets 1011110 0. The receiver counts the 1s: now five — an odd total. Since even parity was agreed, the receiver knows an error occurred and can request retransmission. Were the sender using odd parity instead, the original would have been sent as 1010110 1 (five 1s, odd); the same single flip would make the received total even, again betraying the error.
| Limitation | Explanation |
|---|---|
| Misses even numbers of errors | If two bits flip, the count's parity is unchanged — the error slips through undetected |
| Cannot locate the error | Parity reveals that something is wrong, not which bit |
| Detection only | Parity cannot, by itself, correct anything |
Parity is cheap (one bit) and catches any single-bit error, but a double-bit error is invisible to it — which is exactly why stronger schemes exist.
A simple but powerful extension is two-dimensional parity, which can not only detect a single-bit error but locate it — the first step from detection towards correction. The data is laid out in a grid; a parity bit is added to the end of every row and to the bottom of every column, all chosen here for even parity. The bottom-right corner is itself a parity bit over the parity bits.
Take four 4-bit data words. The right-hand column and bottom row are the computed parity bits:
| C1 | C2 | C3 | C4 | Row parity | |
|---|---|---|---|---|---|
| Row 1 | 1 | 0 | 1 | 1 | 1 |
| Row 2 | 0 | 1 | 1 | 0 | 0 |
| Row 3 | 1 | 1 | 0 | 1 | 1 |
| Row 4 | 0 | 0 | 1 | 1 | 0 |
| Col parity | 0 | 0 | 1 | 1 | 0 |
Check Row 1: its data bits 1 0 1 1 contain three 1s (odd), so the row-parity bit is 1 to make the total even. Check column C1: its data bits 1 0 1 0 contain two 1s (even), so the column-parity bit is 0. Every row and column total is now even.
Worked example — locating an error. Suppose noise flips the bit at Row 2, Column C3 from 1 to 0. The receiver recomputes every parity:
| C1 | C2 | C3 | C4 | Row parity (recomputed vs sent) | |
|---|---|---|---|---|---|
| Row 1 | 1 | 0 | 1 | 1 | 1 = 1 ✓ |
| Row 2 | 0 | 1 | 0 | 0 | 1 ≠ 0 ✗ |
| Row 3 | 1 | 1 | 0 | 1 | 1 = 1 ✓ |
| Row 4 | 0 | 0 | 1 | 1 | 0 = 0 ✓ |
| Col parity (recomp vs sent) | 0 ✓ | 0 ✓ | 0 ≠ 1 ✗ | 1 ✓ |
Exactly one row (Row 2) and one column (C3) now disagree with their sent parity. Their intersection pinpoints the flipped bit — Row 2, Column C3 — which can then simply be inverted to correct it. A single parity bit could only say "something is wrong"; the 2-D scheme says exactly where, turning detection into correction for a single-bit error. (Its limit: two errors in the same row or column can still confuse it, and the cost is one parity bit per row and per column.)
Majority voting sends each bit an odd number of times (commonly three) and the receiver takes the most common value for each position. Because it can reconstruct the intended bit, it both detects and corrects.
Each bit is sent three times; some copies are corrupted in transit:
| Bit 1 | Bit 2 | Bit 3 | Bit 4 | |
|---|---|---|---|---|
| Sent (triplicated) | 111 | 000 | 111 | 000 |
| Received (with noise) | 110 | 000 | 101 | 010 |
| Majority value | 1 | 0 | 1 | 0 |
Reading the table group by group:
110 has two 1s and one 0 ⇒ majority 1 (the single flipped bit is outvoted and corrected).000 is unanimous ⇒ 0.101 has two 1s and one 0 ⇒ majority 1 (corrected).010 has two 0s and one 1 ⇒ majority 0 (corrected).The recovered word 1010 matches the original, with three separate single-bit errors all repaired — no retransmission required. This is the defining feature: majority voting corrects, where parity and checksums merely detect.
Worked example — where it fails. Now suppose two of the three copies of a bit are corrupted, e.g. a group sent as 111 arrives as 100. The majority is now 0 — the wrong value — and worse, the receiver has no idea anything went wrong, because two-against-one looks just like a legitimate vote. Triple-redundancy guarantees correction only for one flipped copy per group; two flips in the same group break it silently. (Sending each bit five or seven times raises the threshold but multiplies the overhead further.)
| Advantage | Disadvantage |
|---|---|
| Corrects single-bit errors per group | Triples the data transmitted |
| Very simple to implement | Fails if 2+ bits in one group flip |
| No arithmetic needed | Huge bandwidth overhead (200% for triple) |
The cost is steep: triplication means sending three times the data — a 200% overhead — which is rarely affordable for general transmission. So majority voting is reserved for situations where simplicity and guaranteed single-bit correction outweigh bandwidth, such as small, critical control signals in electrically noisy environments where a CRC plus retransmission would be impractical.
A checksum is a single value computed from a whole block of data and sent with it; the receiver recomputes it and compares. Any mismatch reveals corruption somewhere in the block.
Data bytes: 45, 67, 89, 123, 210.
sum=45+67+89+123+210=534
checksum=534mod256=22
The sender transmits 45, 67, 89, 123, 210, 22. The mod256 keeps the checksum to a single byte no matter how many values are summed, which is why a one-byte checksum field can guard an arbitrarily long block. To validate, the receiver recomputes the sum of the five data bytes it received, takes it modulo 256, and compares with the transmitted checksum of 22: if they match, the block is accepted; if not, an error is flagged. If, say, 67 were corrupted to 70 in transit, the receiver's sum becomes 45+70+89+123+210=537, giving 537mod256=25=22 — the mismatch is spotted and retransmission requested.
| Limitation | Explanation |
|---|---|
| Cannot locate the error | Only flags that some byte is wrong |
| Detection only | No information to correct |
| Errors can cancel | Two compensating changes can leave the checksum unchanged |
The "errors can cancel" limitation deserves a concrete demonstration, because it is the discriminator examiners look for. Take the same data 45, 67, 89, 123, 210 with its checksum 22. Now suppose two bytes are corrupted in a way that compensates: 67 becomes 70 (+3) and 89 becomes 86 (−3). The receiver's sum is:
45+70+86+123+210=534
534mod256=22
The recomputed checksum is still 22, identical to the value sent — so the corruption passes undetected, even though two bytes are wrong. This is the fundamental weakness of a simple additive checksum: any set of changes that nets to zero (modulo 256) is invisible. Real network protocols therefore use a CRC (cyclic redundancy check), whose polynomial-based arithmetic is far harder to fool and reliably catches burst errors that a plain sum misses.
Checksums are cheap and good at catching random corruption of a block, but a simple sum can be fooled (as just shown), which is why real protocols use stronger functions such as CRCs.
A check digit is an extra digit appended to a number, computed from the other digits by a formula, designed to catch the transcription errors humans make: mistyping a digit, or transposing (swapping) two adjacent digits. Barcodes, ISBNs and bank/card numbers all use one.
ISBN-13 weights the twelve digits alternately by 1 and 3, sums them, and chooses the check digit so the grand total is a multiple of 10.
ISBN body: 978-0-13-468599-?
9(1)+7(3)+8(1)+0(3)+1(1)+3(3)+4(1)+6(3)+8(1)+5(3)+9(1)+9(3)
=9+21+8+0+1+9+4+18+8+15+9+27=129
129mod10=9⇒check digit=10−9=1
So the full ISBN is 978-0-13-468599-1. To validate a received ISBN, include the check digit in the weighted sum; the total should be divisible by 10.
Worked example — validating the complete number. Now run the check the receiver's way, weighting all thirteen digits of 978-0-13-468599-1:
9(1)+7(3)+8(1)+0(3)+1(1)+3(3)+4(1)+6(3)+8(1)+5(3)+9(1)+9(3)+1(1)
=9+21+8+0+1+9+4+18+8+15+9+27+1=130
130mod10=0⇒valid
The total is exactly divisible by 10, so the number passes. Had any single digit been mistyped, the weighted total would (almost certainly) not be a multiple of 10, flagging the error at the point of entry.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.