Character Encoding: ASCII and Unicode

Computers store all data as binary numbers, including text. A character encoding system assigns a unique binary number to each character (letter, digit, symbol). OCR J277 Section 2.6 requires you to understand both ASCII and Unicode.

What Is Character Encoding?

A character encoding is a mapping between characters and numbers. When you type the letter "A" on a keyboard, the computer stores a number (65 in ASCII). When the computer displays that number, it shows the character "A" on screen.

Every character must have a unique number, and both the sender and receiver of data must agree on which encoding system to use. Otherwise, characters may be displayed incorrectly.

ASCII (American Standard Code for Information Interchange)

ASCII was developed in the 1960s and uses 7 bits per character, giving 2^7 = 128 possible characters. These include:

Range	Characters	ASCII codes
Uppercase letters	A-Z	65-90
Lowercase letters	a-z	97-122
Digits	0-9	48-57
Punctuation and symbols	! @ # etc.	Various
Control characters	Enter, Tab, Backspace	0-31
Space	(space)	32

Key ASCII Values to Know

Character	ASCII (denary)	ASCII (binary, 7-bit)
A	65	1000001
B	66	1000010
Z	90	1011010
a	97	1100001
0	48	0110000
Space	32	0100000

OCR Exam Tip: You do not need to memorise the entire ASCII table, but you should know that A = 65, a = 97, and 0 = 48. Notice that lowercase letters are 32 higher than their uppercase equivalents.

Extended ASCII

Standard ASCII uses 7 bits, but most computers use 8-bit bytes. Extended ASCII uses all 8 bits, providing 2^8 = 256 characters. The extra 128 characters include accented letters (e.g., e, u), additional symbols, and line-drawing characters.

Limitations of ASCII

ASCII has significant limitations:

Limitation	Explanation
Only 128 (or 256) characters	Not enough for non-Latin alphabets
English-centric	Designed for the English alphabet; cannot represent Chinese, Arabic, Hindi, etc.
No emoji support	ASCII predates emojis and has no way to include them
Multiple incompatible extensions	Different systems created different extended ASCII sets

flowchart TD
    A[Character typed on keyboard] --> B{Encoding system?}
    B -->|ASCII 7-bit| C[128 codes - English only]
    B -->|Extended ASCII 8-bit| D[256 codes - + accents]
    B -->|Unicode UTF-8| E[1-4 bytes - 149,000+ codes]
    B -->|Unicode UTF-16| F[2 or 4 bytes - all scripts]
    C --> G[A=65, a=97, 0=48]
    D --> G
    E --> H[Backwards compatible with ASCII]
    F --> I[Used by Windows, Java]
    G --> J[Stored as binary]
    H --> J
    I --> J

Unicode

Unicode was created to solve ASCII's limitations. It aims to include every character from every writing system in the world, plus emoji, mathematical symbols, and more.

Feature	ASCII	Unicode
Bits per character	7 (or 8 for extended)	8, 16, or 32 (depending on encoding)
Total characters	128 (or 256)	Over 149,000
Languages supported	English (mainly)	All languages
Emoji	No	Yes
File size	Smaller	Larger
Backwards compatible	N/A	Yes (first 128 Unicode characters = ASCII)

Unicode Encodings

Unicode can be stored in different formats:

Encoding	Bits per character	Notes
UTF-8	8-32 (variable)	Most common on the web; backwards compatible with ASCII
UTF-16	16-32 (variable)	Used by Windows and Java
UTF-32	32 (fixed)	Fixed width; uses more storage

UTF-8 is the most widely used encoding on the internet because:

Characters 0-127 use just 1 byte (identical to ASCII)
Less common characters use 2-4 bytes
It saves storage compared to UTF-16 or UTF-32 for English text

OCR Exam Tip: If asked to compare ASCII and Unicode, mention: number of characters, file size, language support, and backwards compatibility. A common exam answer requires you to explain why Unicode uses more storage than ASCII.

Calculating Storage for Text

To calculate how much storage a text string requires:

Storage = number of characters x bits per character

Example: How many bytes does the word "Hello" require in ASCII?

5 characters x 7 bits = 35 bits = 4.375 bytes (rounded up to 5 bytes in practice, as each character occupies 1 byte in extended ASCII)

Example: How many bytes does the word "Hello" require in UTF-16?

5 characters x 16 bits = 80 bits = 10 bytes

Worked Example: Encoding the Word "Hello" in ASCII, and Comparing with UTF-16

This worked example shows exactly how a text string is stored in memory, and why Unicode files are larger than ASCII files.

Step 1: Look up each character in an ASCII table.

Character	ASCII (denary)	ASCII (binary, 7-bit)	ASCII (binary, 8-bit with leading 0)
H	72	1001000	01001000
e	101	1100101	01100101
l	108	1101100	01101100
l	108	1101100	01101100
o	111	1101111	01101111

Step 2: Calculate the ASCII storage.

In practical storage, each ASCII character occupies 1 byte (8 bits). The word "Hello" therefore uses 5 x 8 = 40 bits = 5 bytes.

Step 3: Calculate the UTF-16 storage.

UTF-16 uses 16 bits per character for characters in the basic multilingual plane. Each ASCII-range character is padded with an extra byte of zeros:

H = 00000000 01001000
e = 00000000 01100101
l = 00000000 01101100
l = 00000000 01101100
o = 00000000 01101111

Total: 5 x 16 = 80 bits = 10 bytes. Exactly double the ASCII storage.

Step 4: Consider UTF-8.

Character Encoding: ASCII and Unicode

Character Encoding: ASCII and Unicode

What Is Character Encoding?

ASCII (American Standard Code for Information Interchange)

Key ASCII Values to Know

Extended ASCII

Limitations of ASCII

Unicode

Unicode Encodings

Calculating Storage for Text

Worked Example: Encoding the Word "Hello" in ASCII, and Comparing with UTF-16

More in Computer Science