Classification of Languages

Formal languages are classified into a hierarchy based on their complexity and the type of automaton required to recognise them. This classification, known as the Chomsky hierarchy, is a fundamental part of OCR H446 section 1.4.

What Is a Formal Language?

A formal language is a set of strings formed from a given alphabet (set of symbols) according to specific rules (grammar).

Concept	Definition	Example
Alphabet	A finite set of symbols	{0, 1} or {a, b, c, ..., z}
String	A finite sequence of symbols from the alphabet	"0110", "hello"
Language	A set of strings (possibly infinite)	All binary strings of even length
Grammar	A set of rules for generating valid strings	BNF production rules

The Chomsky Hierarchy

Noam Chomsky classified formal languages into four types, from most restricted to most general:

Type	Name	Grammar type	Recogniser	Example
Type 3	Regular	Regular grammar	Finite State Machine (FSM)	Identifiers, email patterns
Type 2	Context-free	Context-free grammar (CFG)	Pushdown Automaton (PDA)	Programming language syntax
Type 1	Context-sensitive	Context-sensitive grammar	Linear Bounded Automaton (LBA)	Natural language (some aspects)
Type 0	Recursively enumerable	Unrestricted grammar	Turing Machine	Any computable language

Each type is a strict subset of the next:

Type 3 (Regular) ⊂ Type 2 (Context-free) ⊂ Type 1 (Context-sensitive) ⊂ Type 0 (Recursively enumerable)

Every regular language is also context-free. Every context-free language is also context-sensitive. And so on.

Type 3: Regular Languages

Recognised by: Finite State Machines (FSMs) Described by: Regular expressions, regular grammars

Properties:

No memory beyond the current state
Cannot count or match nested structures
The simplest class of languages

Examples of regular languages:

Language	Description	Regex
Binary strings ending in 0	{0, 00, 10, 100, 110, ...}	[01]*0
Strings over {a,b} with even length	{aa, ab, ba, bb, aaaa, ...}	([ab][ab])*
Identifiers	Start with letter, followed by letters/digits	[a-zA-Z][a-zA-Z0-9]*

NOT regular (examples):

Language	Why not regular
Balanced brackets: (), (()), ((()))	Need to count depth — FSMs cannot count
a^n b^n (equal numbers of a's then b's)	Need to remember how many a's were seen
Palindromes: aba, abcba	Need to remember the first half

Type 2: Context-Free Languages

Recognised by: Pushdown Automata (PDAs) — FSMs with a stack Described by: Context-free grammars (CFGs), BNF notation

Properties:

Have a stack for memory (can count and match)
Can handle nested structures
The basis for programming language syntax

Examples of context-free languages:

Language	Description
Balanced brackets	(), (()), (()()) — the grammar can match opening and closing brackets
a^n b^n	n a's followed by n b's (aaabbb, aabb, ab)
Arithmetic expressions	3 + (5 * 2) — nested brackets and operator precedence
Programming language syntax	if-else statements, nested loops, function definitions

BNF grammar for balanced brackets:

<S> ::= "" | "(" <S> ")" | <S> <S>

This generates: "", "()", "(())", "()()", "((()))", "(()())", etc.

Why the stack matters: The pushdown automaton uses the stack to "remember" opening brackets. When it sees "(", it pushes to the stack. When it sees ")", it pops from the stack. If the stack is empty at the end, the brackets are balanced.

Type 1: Context-Sensitive Languages

Recognised by: Linear Bounded Automata (LBAs) — Turing machines with tape limited to the length of the input Described by: Context-sensitive grammars

Properties:

More powerful than context-free
Production rules can depend on the context (surrounding symbols)
The tape is bounded (unlike a full Turing machine)

Classification of Languages

Classification of Languages

What Is a Formal Language?

The Chomsky Hierarchy

Type 3: Regular Languages

Type 2: Context-Free Languages

Type 1: Context-Sensitive Languages

More in Computer Science