You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
This lesson covers how computers represent text using character encoding schemes. You need to understand ASCII, Extended ASCII, and Unicode (including UTF-8, UTF-16, and UTF-32) for the OCR H446 specification.
A character encoding is a system that maps characters (letters, digits, symbols) to numerical codes that a computer can store and process. Each character is assigned a unique binary number.
Without an agreed encoding scheme, two computers exchanging text data would not be able to interpret the characters correctly.
ASCII was developed in the 1960s as a standard for text communication.
| Feature | Detail |
|---|---|
| Bits per character | 7 bits |
| Number of characters | 2^7 = 128 |
| Characters included | Uppercase A-Z, lowercase a-z, digits 0-9, punctuation, control characters (e.g., newline, tab) |
| Character | Denary Code | Binary |
|---|---|---|
| Space | 32 | 0100000 |
| 0 | 48 | 0110000 |
| 9 | 57 | 0111001 |
| A | 65 | 1000001 |
| Z | 90 | 1011010 |
| a | 97 | 1100001 |
| z | 122 | 1111010 |
Extended ASCII uses 8 bits per character, doubling the available characters.
| Feature | Detail |
|---|---|
| Bits per character | 8 bits |
| Number of characters | 2^8 = 256 |
| Extra characters | Accented letters, additional symbols, box-drawing characters |
Unicode is a universal character encoding standard designed to represent every character from every writing system in the world.
| Feature | Detail |
|---|---|
| Characters defined | Over 149,000 (as of 2023) |
| Scripts covered | Latin, Cyrillic, Arabic, Chinese, Japanese, Korean, Devanagari, emoji, and many more |
| Code points | U+0000 to U+10FFFF (over 1.1 million possible) |
Unicode defines code points (abstract numbers for characters), but the actual encoding format determines how these code points are stored as bytes.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.