You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Computers store all data as binary numbers, including text. A character encoding system assigns a unique binary number to each character (letter, digit, symbol). OCR J277 Section 2.6 requires you to understand both ASCII and Unicode.
A character encoding is a mapping between characters and numbers. When you type the letter "A" on a keyboard, the computer stores a number (65 in ASCII). When the computer displays that number, it shows the character "A" on screen.
Every character must have a unique number, and both the sender and receiver of data must agree on which encoding system to use. Otherwise, characters may be displayed incorrectly.
ASCII was developed in the 1960s and uses 7 bits per character, giving 2^7 = 128 possible characters. These include:
| Range | Characters | ASCII codes |
|---|---|---|
| Uppercase letters | A-Z | 65-90 |
| Lowercase letters | a-z | 97-122 |
| Digits | 0-9 | 48-57 |
| Punctuation and symbols | ! @ # etc. | Various |
| Control characters | Enter, Tab, Backspace | 0-31 |
| Space | (space) | 32 |
| Character | ASCII (denary) | ASCII (binary, 7-bit) |
|---|---|---|
| A | 65 | 1000001 |
| B | 66 | 1000010 |
| Z | 90 | 1011010 |
| a | 97 | 1100001 |
| 0 | 48 | 0110000 |
| Space | 32 | 0100000 |
OCR Exam Tip: You do not need to memorise the entire ASCII table, but you should know that A = 65, a = 97, and 0 = 48. Notice that lowercase letters are 32 higher than their uppercase equivalents.
Standard ASCII uses 7 bits, but most computers use 8-bit bytes. Extended ASCII uses all 8 bits, providing 2^8 = 256 characters. The extra 128 characters include accented letters (e.g., e, u), additional symbols, and line-drawing characters.
ASCII has significant limitations:
| Limitation | Explanation |
|---|---|
| Only 128 (or 256) characters | Not enough for non-Latin alphabets |
| English-centric | Designed for the English alphabet; cannot represent Chinese, Arabic, Hindi, etc. |
| No emoji support | ASCII predates emojis and has no way to include them |
| Multiple incompatible extensions | Different systems created different extended ASCII sets |
flowchart TD
A[Character typed on keyboard] --> B{Encoding system?}
B -->|ASCII 7-bit| C[128 codes - English only]
B -->|Extended ASCII 8-bit| D[256 codes - + accents]
B -->|Unicode UTF-8| E[1-4 bytes - 149,000+ codes]
B -->|Unicode UTF-16| F[2 or 4 bytes - all scripts]
C --> G[A=65, a=97, 0=48]
D --> G
E --> H[Backwards compatible with ASCII]
F --> I[Used by Windows, Java]
G --> J[Stored as binary]
H --> J
I --> J
Unicode was created to solve ASCII's limitations. It aims to include every character from every writing system in the world, plus emoji, mathematical symbols, and more.
| Feature | ASCII | Unicode |
|---|---|---|
| Bits per character | 7 (or 8 for extended) | 8, 16, or 32 (depending on encoding) |
| Total characters | 128 (or 256) | Over 149,000 |
| Languages supported | English (mainly) | All languages |
| Emoji | No | Yes |
| File size | Smaller | Larger |
| Backwards compatible | N/A | Yes (first 128 Unicode characters = ASCII) |
Unicode can be stored in different formats:
| Encoding | Bits per character | Notes |
|---|---|---|
| UTF-8 | 8-32 (variable) | Most common on the web; backwards compatible with ASCII |
| UTF-16 | 16-32 (variable) | Used by Windows and Java |
| UTF-32 | 32 (fixed) | Fixed width; uses more storage |
UTF-8 is the most widely used encoding on the internet because:
OCR Exam Tip: If asked to compare ASCII and Unicode, mention: number of characters, file size, language support, and backwards compatibility. A common exam answer requires you to explain why Unicode uses more storage than ASCII.
To calculate how much storage a text string requires:
Storage = number of characters x bits per character
Example: How many bytes does the word "Hello" require in ASCII?
Example: How many bytes does the word "Hello" require in UTF-16?
This worked example shows exactly how a text string is stored in memory, and why Unicode files are larger than ASCII files.
Step 1: Look up each character in an ASCII table.
| Character | ASCII (denary) | ASCII (binary, 7-bit) | ASCII (binary, 8-bit with leading 0) |
|---|---|---|---|
| H | 72 | 1001000 | 01001000 |
| e | 101 | 1100101 | 01100101 |
| l | 108 | 1101100 | 01101100 |
| l | 108 | 1101100 | 01101100 |
| o | 111 | 1101111 | 01101111 |
Step 2: Calculate the ASCII storage.
In practical storage, each ASCII character occupies 1 byte (8 bits). The word "Hello" therefore uses 5 x 8 = 40 bits = 5 bytes.
Step 3: Calculate the UTF-16 storage.
UTF-16 uses 16 bits per character for characters in the basic multilingual plane. Each ASCII-range character is padded with an extra byte of zeros:
Total: 5 x 16 = 80 bits = 10 bytes. Exactly double the ASCII storage.
Step 4: Consider UTF-8.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.