You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Formal languages are classified into a hierarchy based on their complexity and the type of automaton required to recognise them. This classification, known as the Chomsky hierarchy, is a fundamental part of OCR H446 section 1.4.
A formal language is a set of strings formed from a given alphabet (set of symbols) according to specific rules (grammar).
| Concept | Definition | Example |
|---|---|---|
| Alphabet | A finite set of symbols | {0, 1} or {a, b, c, ..., z} |
| String | A finite sequence of symbols from the alphabet | "0110", "hello" |
| Language | A set of strings (possibly infinite) | All binary strings of even length |
| Grammar | A set of rules for generating valid strings | BNF production rules |
Noam Chomsky classified formal languages into four types, from most restricted to most general:
| Type | Name | Grammar type | Recogniser | Example |
|---|---|---|---|---|
| Type 3 | Regular | Regular grammar | Finite State Machine (FSM) | Identifiers, email patterns |
| Type 2 | Context-free | Context-free grammar (CFG) | Pushdown Automaton (PDA) | Programming language syntax |
| Type 1 | Context-sensitive | Context-sensitive grammar | Linear Bounded Automaton (LBA) | Natural language (some aspects) |
| Type 0 | Recursively enumerable | Unrestricted grammar | Turing Machine | Any computable language |
Each type is a strict subset of the next:
Type 3 (Regular) ⊂ Type 2 (Context-free) ⊂ Type 1 (Context-sensitive) ⊂ Type 0 (Recursively enumerable)
Every regular language is also context-free. Every context-free language is also context-sensitive. And so on.
Recognised by: Finite State Machines (FSMs) Described by: Regular expressions, regular grammars
Properties:
Examples of regular languages:
| Language | Description | Regex |
|---|---|---|
| Binary strings ending in 0 | {0, 00, 10, 100, 110, ...} | [01]*0 |
| Strings over {a,b} with even length | {aa, ab, ba, bb, aaaa, ...} | ([ab][ab])* |
| Identifiers | Start with letter, followed by letters/digits | [a-zA-Z][a-zA-Z0-9]* |
NOT regular (examples):
| Language | Why not regular |
|---|---|
| Balanced brackets: (), (()), ((())) | Need to count depth — FSMs cannot count |
| a^n b^n (equal numbers of a's then b's) | Need to remember how many a's were seen |
| Palindromes: aba, abcba | Need to remember the first half |
Recognised by: Pushdown Automata (PDAs) — FSMs with a stack Described by: Context-free grammars (CFGs), BNF notation
Properties:
Examples of context-free languages:
| Language | Description |
|---|---|
| Balanced brackets | (), (()), (()()) — the grammar can match opening and closing brackets |
| a^n b^n | n a's followed by n b's (aaabbb, aabb, ab) |
| Arithmetic expressions | 3 + (5 * 2) — nested brackets and operator precedence |
| Programming language syntax | if-else statements, nested loops, function definitions |
BNF grammar for balanced brackets:
<S> ::= "" | "(" <S> ")" | <S> <S>
This generates: "", "()", "(())", "()()", "((()))", "(()())", etc.
Why the stack matters: The pushdown automaton uses the stack to "remember" opening brackets. When it sees "(", it pushes to the stack. When it sees ")", it pops from the stack. If the stack is empty at the end, the brackets are balanced.
Recognised by: Linear Bounded Automata (LBAs) — Turing machines with tape limited to the length of the input Described by: Context-sensitive grammars
Properties:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.