Characters: ASCII and Unicode

Computers store everything as binary numbers, so text characters (letters, digits, symbols) must also be represented using binary codes. This lesson covers the two main character encoding systems: ASCII and Unicode.

What Is a Character Set?

A character set is a defined list of characters that a computer can recognise and store. Each character in the set is assigned a unique binary code (a number). The character set determines which characters are available and how many bits are needed to store each one.

ASCII (American Standard Code for Information Interchange)

ASCII was one of the earliest character encoding standards, developed in the 1960s. It was designed primarily for the English language.

Key Facts about ASCII

Uses 7 bits per character (standard ASCII), giving 128 possible characters (0–127).
Extended ASCII uses 8 bits (1 byte), giving 256 possible characters (0–255).
Includes uppercase letters (A–Z), lowercase letters (a–z), digits (0–9), punctuation marks, and control characters (such as newline, tab, and backspace).
Each character has a unique numerical code called its ASCII value or code point.

Selected ASCII Values

Character	Denary Code	Binary (7-bit)	Hex
Space	32	0100000	20
0	48	0110000	30
9	57	0111001	39
A	65	1000001	41
B	66	1000010	42
Z	90	1011010	5A
a	97	1100001	61
b	98	1100010	62
z	122	1111010	7A

Important Patterns in ASCII

Uppercase letters A–Z have codes 65–90.
Lowercase letters a–z have codes 97–122.
The difference between an uppercase letter and its lowercase equivalent is always 32 (e.g. A = 65, a = 97).
Digits 0–9 have codes 48–57 (these are the character codes, not the numeric values).

Worked Example

The word "Hi" in ASCII:

H = 72 = 01001000
i = 105 = 01101001

So "Hi" is stored as: 01001000 01101001

This requires 2 bytes (16 bits) of storage.

Limitations of ASCII

ASCII has significant limitations:

It was designed for English only — it does not include characters for other languages such as Chinese, Arabic, Japanese, or even accented European characters like é or ü.
Even extended ASCII (256 characters) is not enough to represent all the world's writing systems.
As computing became global, a more comprehensive standard was needed.

ASCII vs Unicode at a Glance

flowchart TD
    A[Character to encode] --> B{Which standard?}
    B -->|ASCII| C[7 or 8 bits]
    B -->|Unicode| D[8 to 32 bits]
    C --> E[128-256 characters]
    D --> F[Over 140,000 characters]
    E --> G[English only]
    F --> H[All world scripts + emoji]
    C --> I[Smaller files]
    D --> J[Larger files]

Unicode

Unicode was developed to solve the limitations of ASCII. It aims to provide a unique code for every character in every writing system in the world.

Key Facts about Unicode

Unicode can represent over 140,000 characters from more than 150 scripts (writing systems).
It includes Latin, Greek, Cyrillic, Chinese, Japanese, Korean, Arabic, Hindi, and many more — as well as mathematical symbols, emoji, and historic scripts.
Unicode has several encoding formats:
- UTF-8: Uses 1 to 4 bytes per character. The first 128 characters are identical to ASCII, making it backward-compatible.
- UTF-16: Uses 2 or 4 bytes per character.
- UTF-32: Uses exactly 4 bytes (32 bits) per character.

Unicode vs ASCII

Feature	ASCII	Unicode
Number of characters	128 (standard) or 256 (extended)	Over 140,000
Bits per character	7 or 8	8 to 32 (varies by encoding)
Languages supported	English only (extended ASCII adds some European characters)	Virtually all world languages
File size	Smaller (1 byte per character)	Larger (can use up to 4 bytes per character)
Backward compatibility	—	UTF-8 is backward-compatible with ASCII

Why Unicode Uses More Storage

Because Unicode needs to represent many more characters, it requires more bits per character. This means text files encoded in Unicode may be larger than the same text in ASCII. However, UTF-8 is efficient because it uses only 1 byte for standard English characters and more bytes only when needed for other scripts.

Calculating Text File Sizes

The file size of a text document can be estimated using:

File size (bits) = Number of characters × Bits per character

Worked Example

A document contains 2000 characters encoded in ASCII (8-bit extended).

File size = 2000 × 8 = 16,000 bits
16,000 ÷ 8 = 2000 bytes
2000 ÷ 1024 ≈ 1.95 KB

If the same document were encoded in UTF-16 (16 bits per character):

File size = 2000 × 16 = 32,000 bits = 4000 bytes ≈ 3.91 KB

Exam Tip: When asked about the difference between ASCII and Unicode, focus on: (1) the number of characters each can represent, (2) the number of bits used per character, and (3) the fact that Unicode supports many more languages. Always mention that the trade-off for Unicode is larger file sizes.

Summary

A character set maps characters to unique binary codes.
ASCII uses 7 bits (128 characters) or 8 bits (256 characters) and supports English text.
Unicode supports over 140,000 characters from virtually all writing systems.
UTF-8 is the most common Unicode encoding and is backward-compatible with ASCII.
Unicode files may be larger than ASCII files because more bits are used per character.
File size can be calculated as: number of characters × bits per character.

Characters: ASCII and Unicode

Characters: ASCII and Unicode

What Is a Character Set?

ASCII (American Standard Code for Information Interchange)

Key Facts about ASCII

Selected ASCII Values

Important Patterns in ASCII

Worked Example

Limitations of ASCII

ASCII vs Unicode at a Glance

Unicode

Key Facts about Unicode

Unicode vs ASCII

Why Unicode Uses More Storage

Calculating Text File Sizes

Worked Example

Summary

Extended Worked Examples

Worked Example: Encode the word "Code" in 8-bit ASCII

More in Computer Science