BNF and Syntax Diagrams

Backus-Naur Form (BNF) is a notation for defining the syntax (grammar) of formal languages, including programming languages. Syntax diagrams (also called railroad diagrams) are the visual equivalent. Both are essential for OCR H446 section 1.4.

Why Define Syntax Formally?

Reason	Explanation
Compiler design	A compiler needs precise rules to parse source code correctly
Language specification	Defines exactly what constitutes valid code in a programming language
Error detection	Helps compilers produce meaningful error messages
Unambiguity	Formal syntax leaves no room for interpretation — every string is either valid or invalid
Documentation	Provides a reference for programmers learning a language

Terminal and Non-Terminal Symbols

Type	Definition	Convention
Terminal	An actual symbol/token that appears in the language (cannot be broken down further)	Written in quotes or bold: "if", "while", "+", "0", "1"
Non-terminal	A symbol that can be replaced by a sequence of terminals and/or non-terminals (a rule)	Written in angle brackets: , ,

Example:

Terminal: "if", "(", ")", "{", "}", "=", ";", "0", "1", ..., "9"
Non-terminal: , , ,

Backus-Naur Form (BNF)

BNF consists of production rules that define how non-terminals can be expanded:

Format:

<non-terminal> ::= definition

The symbol ::= means "is defined as".

Symbol	Meaning
::=	"is defined as"

	Non-terminal symbol
"text"	Terminal symbol

Example — Defining a digit:

<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

This means a is any one of the characters 0 through 9.

Example — Defining a natural number:

<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<natural-number> ::= <digit> | <digit> <natural-number>

A is either a single digit, or a digit followed by another natural number (recursion). This allows numbers of any length: "5", "42", "1000".

Recursive Definitions in BNF

Recursion is essential in BNF for defining structures of arbitrary length:

Left-recursive:

<list> ::= <item> | <list> "," <item>

This defines a list as either a single item, or a list followed by a comma and another item. It generates: "a", "a,b", "a,b,c", "a,b,c,d", ...

Right-recursive:

<list> ::= <item> | <item> "," <list>

This is equivalent but builds from right to left.

BNF Example: Simple Arithmetic Expression

<expression> ::= <term> | <expression> "+" <term> | <expression> "-" <term>
<term> ::= <factor> | <term> "*" <factor> | <term> "/" <factor>
<factor> ::= <number> | "(" <expression> ")"
<number> ::= <digit> | <digit> <number>
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

This grammar handles:

Numbers: "42", "7", "100"
Addition and subtraction: "3 + 5", "10 - 2"
Multiplication and division: "4 * 5", "10 / 2"
Brackets: "(3 + 5) * 2"
Operator precedence: * and / bind tighter than + and - (because is lower in the hierarchy than )

BNF and Syntax Diagrams

BNF and Syntax Diagrams

Why Define Syntax Formally?

Terminal and Non-Terminal Symbols

Backus-Naur Form (BNF)

Recursive Definitions in BNF

BNF Example: Simple Arithmetic Expression

BNF Example: Simple IF Statement

More in Computer Science