You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Computers store everything as binary numbers, so text characters (letters, digits, symbols) must also be represented using binary codes. This lesson covers the two main character encoding systems: ASCII and Unicode.
A character set is a defined list of characters that a computer can recognise and store. Each character in the set is assigned a unique binary code (a number). The character set determines which characters are available and how many bits are needed to store each one.
ASCII was one of the earliest character encoding standards, developed in the 1960s. It was designed primarily for the English language.
| Character | Denary Code | Binary (7-bit) | Hex |
|---|---|---|---|
| Space | 32 | 0100000 | 20 |
| 0 | 48 | 0110000 | 30 |
| 9 | 57 | 0111001 | 39 |
| A | 65 | 1000001 | 41 |
| B | 66 | 1000010 | 42 |
| Z | 90 | 1011010 | 5A |
| a | 97 | 1100001 | 61 |
| b | 98 | 1100010 | 62 |
| z | 122 | 1111010 | 7A |
The word "Hi" in ASCII:
So "Hi" is stored as: 01001000 01101001
This requires 2 bytes (16 bits) of storage.
ASCII has significant limitations:
flowchart TD
A[Character to encode] --> B{Which standard?}
B -->|ASCII| C[7 or 8 bits]
B -->|Unicode| D[8 to 32 bits]
C --> E[128-256 characters]
D --> F[Over 140,000 characters]
E --> G[English only]
F --> H[All world scripts + emoji]
C --> I[Smaller files]
D --> J[Larger files]
Unicode was developed to solve the limitations of ASCII. It aims to provide a unique code for every character in every writing system in the world.
| Feature | ASCII | Unicode |
|---|---|---|
| Number of characters | 128 (standard) or 256 (extended) | Over 140,000 |
| Bits per character | 7 or 8 | 8 to 32 (varies by encoding) |
| Languages supported | English only (extended ASCII adds some European characters) | Virtually all world languages |
| File size | Smaller (1 byte per character) | Larger (can use up to 4 bytes per character) |
| Backward compatibility | — | UTF-8 is backward-compatible with ASCII |
Because Unicode needs to represent many more characters, it requires more bits per character. This means text files encoded in Unicode may be larger than the same text in ASCII. However, UTF-8 is efficient because it uses only 1 byte for standard English characters and more bytes only when needed for other scripts.
The file size of a text document can be estimated using:
File size (bits) = Number of characters × Bits per character
A document contains 2000 characters encoded in ASCII (8-bit extended).
If the same document were encoded in UTF-16 (16 bits per character):
Exam Tip: When asked about the difference between ASCII and Unicode, focus on: (1) the number of characters each can represent, (2) the number of bits used per character, and (3) the fact that Unicode supports many more languages. Always mention that the trade-off for Unicode is larger file sizes.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.