You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
This lesson draws together everything in the automata strand — finite-state machines, regular expressions, BNF/context-free grammars, pushdown automata and Turing machines — into a single, elegant picture: the Chomsky hierarchy. Noam Chomsky (a linguist, in the 1950s) observed that formal languages fall into a layered classification, each layer more expressive than the one below, and — this is the beautiful part — each layer corresponds exactly to a type of machine that recognises it. The simplest languages need only a finite-state machine; adding a stack reaches the next layer; adding a bounded tape the next; and a full Turing machine reaches the most general. The hierarchy is therefore not just a way of sorting languages by difficulty — it is a unification showing that "how complex is this language?" and "how powerful a machine do I need to recognise it?" are the same question.
For an A-Level student the hierarchy is where the whole topic clicks into place. Every fact you have met — that a regex cannot match balanced brackets, that BNF can, that a Turing machine can do anything computable — is really a statement about where in the hierarchy a language sits and which machine it needs. This lesson sets out the four classes (regular, context-free, context-sensitive, recursively enumerable), the four recognising machines (FSM, pushdown automaton, linear-bounded automaton, Turing machine), and — most examinably — what each class can and cannot describe, with the boundary cases (balanced brackets, anbn, anbncn) that examiners use to test the distinctions. It is the synthesis of FSMs, regex, BNF and Turing machines into one ordered structure.
This lesson addresses the classification-of-languages content of H446 section 2.1 (theory of computation). The specification expects you to be able to:
It is the capstone of the automata sequence, drawing on finite-state machines, regular expressions, BNF, and Turing machines from earlier in this course, and it links to lexical and syntax analysis in the Translators topic.
Before the hierarchy, four terms must be precise:
| Concept | Definition | Example |
|---|---|---|
| Alphabet | a finite set of symbols | {0,1} or {a,b,…,z} |
| String | a finite sequence of symbols from the alphabet | "0110", "hello" |
| Language | a set of strings (possibly infinite) | all binary strings of even length |
| Grammar | a set of rules that generate exactly the valid strings | BNF production rules |
The pivotal idea is that a language is a set of strings, and a grammar is a finite description of that (often infinite) set. A grammar and a recognising machine are two sides of one coin: the grammar generates the strings of the language; the machine recognises (accepts) exactly those strings. The Chomsky hierarchy classifies languages by how restricted their grammar rules must be — and, equivalently, how powerful a machine is needed to recognise them.
Chomsky's four types, from most restricted / least powerful (Type 3) to most general / most powerful (Type 0):
| Type | Name | Grammar | Recognising machine | Typical example |
|---|---|---|---|---|
| Type 3 | Regular | regular grammar | Finite-state machine | identifiers, number formats |
| Type 2 | Context-free | context-free grammar (BNF) | Pushdown automaton | programming-language syntax |
| Type 1 | Context-sensitive | context-sensitive grammar | Linear-bounded automaton | anbncn; some natural-language features |
| Type 0 | Recursively enumerable | unrestricted grammar | Turing machine | any computable language |
The classes nest strictly — each is a proper subset of the one above:
Regular⊊Context-free⊊Context-sensitive⊊Recursively enumerable
So every regular language is also context-free, every context-free language is also context-sensitive, and so on — but not the reverse. The "⊊" (strict subset) is important: at each level there exist languages in the larger class that are provably absent from the smaller one — balanced brackets (context-free but not regular), anbncn (context-sensitive but not context-free), and the halting language (recursively enumerable but not decidable). Those witnesses are exactly the examples examiners use to test that you understand the boundaries are real.
flowchart TB
T0["Type 0 — Recursively enumerable (Turing machine)"]
T1["Type 1 — Context-sensitive (linear-bounded automaton)"]
T2["Type 2 — Context-free (pushdown automaton)"]
T3["Type 3 — Regular (finite-state machine)"]
T0 --> T1 --> T2 --> T3
Read the diagram as containment: the Type-3 box sits inside Type 2, inside Type 1, inside Type 0 — the higher you go, the more languages you can describe, and the more powerful the machine you need.
Recognised by: finite-state machines. Described by: regular expressions and regular grammars.
The defining limitation is finite memory — the only thing an FSM remembers is which of finitely many states it is in, so a regular language cannot count without bound and cannot match nested structure. These are the simplest languages.
| Regular language | Description | Regex |
|---|---|---|
| binary strings ending in 0 | "0", "10", "100", … | [01]*0 |
| even-length strings over {a,b} | "", "aa", "ab", "abab", … | ([ab][ab])* |
| identifiers | letter, then letters/digits | [a-zA-Z][a-zA-Z0-9]* |
Not regular — these need more than finite memory:
| Language | Why it is not regular |
|---|---|
balanced brackets: (), (()), ((())) | nesting depth is unbounded — an FSM cannot count |
| anbn | must remember how many a's appeared to match the b's |
| palindromes: "aba", "abcba" | must remember the whole first half to compare |
A bounded count is fine (parity, remainder mod k — finitely many states), but an unbounded count is not. That single test — "does this require remembering an unbounded quantity?" — decides whether a language escapes the regular class.
Recognised by: pushdown automata — an FSM plus a stack. Described by: context-free grammars (BNF).
The stack is the decisive addition. A last-in-first-out store of unbounded depth lets the machine count and match nested structures: push a marker for each opening item, pop for each closing one. This is exactly the memory regular languages lacked.
| Context-free language | Description |
|---|---|
| balanced brackets | "()", "(())", "(()())" — matched opens and closes to any depth |
| anbn | n a's then n b's: "ab", "aabb", "aaabbb" |
| arithmetic expressions | "3 + (5 * 2)" — nested brackets, precedence |
| programming-language syntax | if-else, nested loops, function definitions |
The canonical context-free grammar — balanced brackets — is the same one from the BNF lesson:
<S> ::= "" | "(" <S> ")" | <S> <S>
generating "", "()", "(())", "()()", "((()))", … . Why the stack works: the pushdown automaton pushes when it reads "(" and pops when it reads ")"; the string is balanced exactly when the stack is empty at the end (and never tried to pop when empty). The stack "remembers" how many brackets are still open — the unbounded count an FSM could not keep. This is why programming-language syntax is context-free, and why a compiler's parser (not its lexer) is needed to check it.
Recognised by: linear-bounded automata (LBAs) — a Turing machine whose tape is limited to the length of the input (it cannot use unbounded extra space). Described by: context-sensitive grammars.
Context-sensitive grammars are more powerful than context-free because a production may depend on the surrounding context. Where a context-free rule replaces a single non-terminal regardless of its neighbours, a context-sensitive rule has the form
αAβ → αγβ
— the non-terminal A may be rewritten to γ only when it appears between the context strings α and β. That contextual condition is what lets the grammar coordinate several counts at once.
The standard witness that this class is strictly larger than context-free is:
A pushdown automaton's single stack can match a's against b's or b's against c's, but not both simultaneously — once it pops the stack to check the b's, it has "forgotten" the count for the c's. So anbncn is not context-free; it is context-sensitive, recognised by an LBA whose bounded tape can hold and compare all three counts. Some features of natural language (certain agreement and cross-serial dependencies) are likewise beyond context-free, which is part of why Chomsky introduced the hierarchy.
Recognised by: Turing machines. Described by: unrestricted grammars (rules with no constraints).
This is the most general class — anything a Turing machine can recognise, and hence (by the Church–Turing thesis) anything computable at all. The crucial subtlety is how a Turing machine "recognises" a Type-0 language:
That possibility of never halting on non-members is exactly what distinguishes "recursively enumerable" from the stricter notion of "decidable", and it connects straight back to the halting problem:
| Term | Meaning | Also called |
|---|---|---|
| Decidable | a Turing machine always halts — accepts members, rejects non-members | recursive |
| Recursively enumerable | a Turing machine accepts members but may loop forever on non-members | recognisable / semi-decidable |
The decidable languages are a strict subset of the recursively enumerable ones. The halting language (descriptions of machine-plus-input pairs that halt) is the classic example of a language that is recursively enumerable but not decidable — you can confirm halting (run it and see) but cannot always rule it out, precisely the halting-problem result.
The hierarchy can be read from the machine side (which we have stressed) or equally from the grammar side, and seeing both makes the structure click. Chomsky's original definition is in terms of how restricted the production rules are — the more freedom a grammar's rules have, the higher the class it can describe. The four grammar types differ in the shape of rule they permit:
| Grammar type | Allowed rule shape | Class generated |
|---|---|---|
| Regular | a non-terminal expands to at most one terminal and at most one non-terminal, all on the same side (e.g. A→aB or A→a) | Regular (Type 3) |
| Context-free | a single non-terminal expands to any string of terminals/non-terminals (e.g. A→α) | Context-free (Type 2) |
| Context-sensitive | rules may use surrounding context, and the right side is never shorter than the left | Context-sensitive (Type 1) |
| Unrestricted | any rule whatsoever, with no constraint | Recursively enumerable (Type 0) |
The pattern is one of progressively lifting restrictions. A regular grammar is so tightly constrained that each rule can extend the string by only one symbol in a fixed direction — which is exactly why it cannot express nesting, and why it corresponds to the memoryless FSM. A context-free grammar relaxes this to "replace one non-terminal by anything", which is what permits the recursive, self-embedding rules (<S> ::= "(" <S> ")") that capture nesting — matching the stack of a PDA. A context-sensitive grammar additionally lets a replacement depend on neighbours and forbids rules that shorten the string, giving the extra coordination an LBA's bounded tape provides. An unrestricted grammar removes every constraint, matching the full Turing machine. So the same four-level structure appears whether you classify by machine memory or by grammar freedom — they are two views of one hierarchy, which is precisely why the correspondence between language class and recognising machine is so tight. This grammar/machine duality (a grammar generates the strings; the matching machine recognises them) is the deepest idea in the topic, and naming it explicitly is a hallmark of a top answer.
The whole hierarchy can be read off a single relationship: more memory ⇒ more powerful machine ⇒ larger language class.
| Automaton | Memory it adds | Language class | Can it count / nest? |
|---|---|---|---|
| Finite-state machine | none beyond finite states | Regular (Type 3) | no unbounded count |
| Pushdown automaton | a stack (LIFO) | Context-free (Type 2) | one unbounded count / nesting |
| Linear-bounded automaton | a tape bounded by input length | Context-sensitive (Type 1) | several coordinated counts |
| Turing machine | an unbounded read/write tape | Recursively enumerable (Type 0) | anything computable |
This is the single most powerful summary in the topic: the only difference between the four machines is how much (and what kind of) memory each has, and that difference is exactly what determines which languages each can recognise. The FSM has no scratch memory; the PDA gains a stack (enough for one nesting); the LBA gains a bounded tape (enough for several simultaneous counts); the Turing machine gains an unbounded tape (enough for all of computation).
The hierarchy is not merely theoretical bookkeeping; it directly shapes how real software is built, above all compilers:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.