Lists and List Processing

The list is the workhorse data structure of functional programming. In Haskell almost every data-processing task — summing, transforming, searching, grouping — reduces to building lists and decomposing them. Understanding lists deeply therefore unlocks the whole topic: map, filter and fold are simply recursive list traversals; list comprehensions are a concise notation for the same idea; and the big-data MapReduce model (§4.11) is list processing scaled across a cluster.

What makes the Haskell list special is its recursive structure. A list is built from exactly two constructors — the empty list [] and the cons operator (:), which glues a head element onto an existing tail. That two-case shape is precisely what your pattern-matching and recursion mirror, so the structure of the data and the structure of the code line up perfectly. This lesson works through building and decomposing lists, the cost difference between cons and append, list comprehensions with multiple generators and guards, and traced recursive processing — all with worked Haskell.

Spec Mapping

This lesson addresses AQA 7517 §4.12.3, list processing in the functional paradigm. You should be able to: state that a list is a sequence of values of the same type; use and explain the cons operator (:) to build lists and the functions head, tail, null to decompose and test them; understand that [1,2,3] is shorthand for 1 : 2 : 3 : []; write recursive functions that traverse a list by pattern-matching [] against (x:xs); construct lists with ranges and list comprehensions (including guards and multiple generators); and explain the complexity difference between prepending with cons (O(1)) and concatenating with ++ (O(n)). It builds directly on recursion and pattern matching and underpins the higher-order list functions of §4.12.3 and the big-data motivation of §4.11.

The explanations here are our own restatement of the assessable content, not a verbatim quotation of AQA's specification.

Lists Are Homogeneous Sequences

A Haskell list holds a sequence of elements that must all be of the same type, written in square brackets:

numbers = [1, 2, 3, 4, 5]          -- a list of integers
names   = ["Alice", "Bob", "Carol"] -- a list of strings
empty   = []                        -- the empty list
single  = [42]                      -- a one-element list

The type of a list is written [a], where a is the element type:

[1, 2, 3]            :: [Int]
["hello", "world"]   :: [String]
[True, False, True]  :: [Bool]

You cannot mix integers and strings in one list — [1, "two"] is a type error rejected at compile time. (When you genuinely need a fixed bundle of different types, that is the job of a tuple, covered in the Haskell lesson.) One consequence worth memorising: a String is itself a list of Char, so every list function works on strings too:

"hello" == ['h', 'e', 'l', 'l', 'o']    -- True

The Recursive Structure: Cons and the Empty List

Every Haskell list is built from just two pieces:

[] — the empty list, the base case of the structure;
(:) — the cons operator, which prepends one element (the head) onto an existing list (the tail).

So the familiar bracket notation is pure syntactic sugar:

[1, 2, 3]  ==  1 : 2 : 3 : []     -- both are the same list

Reading right-to-left: start with [], cons on 3 to get [3], cons on 2 to get [2,3], cons on 1 to get [1,2,3]. We can picture the chain of cons cells:

graph LR
    A["1 :"] --> B["2 :"]
    B --> C["3 :"]
    C --> D["[]"]

This two-constructor shape is the key to everything else in the lesson. Because a list is either empty or a head consed onto a tail, any function over a list naturally splits into two cases — and that is exactly what pattern matching gives you.

Decomposing with head, tail and null

head [1, 2, 3, 4, 5]    -- 1            (the first element)
tail [1, 2, 3, 4, 5]    -- [2, 3, 4, 5] (everything after the head)
null []                  -- True         (is the list empty?)
null [1, 2]              -- False

head and tail are the inverses of cons: head (x:xs) is x and tail (x:xs) is xs. Warning: head and tail on an empty list raise a runtime error, because there is no first element to return. This is exactly why we test with null, or pattern-match the [] case explicitly, before taking a head — a habit examiners look for.

Cons (:) versus append (++): the cost difference

The cons operator adds a single element to the front of a list; the ++ operator concatenates two whole lists:

1 : [2, 3, 4]               -- [1, 2, 3, 4]   prepend one element
'h' : "ello"                -- "hello"
[1, 2, 3] ++ [4, 5]         -- [1, 2, 3, 4, 5] join two lists
"Hello" ++ " " ++ "World"   -- "Hello World"

The performance difference is an examined fact. Prepending with (:) is O(1) — it creates a single new cell whose tail points at the existing list, sharing it without copying (this is the structural sharing from the immutability lesson). Concatenating with (++) is O(n) in the length of the left list, because xs ++ ys must walk to the end of xs, rebuilding every one of its cells so the last one can point at ys. The right list ys is shared untouched. The practical lesson: build lists by consing onto the front, not by repeatedly appending to the end — a naive result = result ++ [x] inside a loop is quadratic. To make the cost concrete: building a 1,000-element list by appending one at a time performs roughly $1 + 2 + \cdots + 1000 \approx 500{,}000$ cell copies, whereas building it with 1,000 conses performs about 1,000 — a five-hundred-fold difference that grows with the list, which is why the choice of operator is a genuine design decision rather than a stylistic preference.

Other common list functions

last [1, 2, 3, 4, 5]     -- 5            (last element)
init [1, 2, 3, 4, 5]     -- [1, 2, 3, 4] (all except the last)
length [1, 2, 3, 4, 5]   -- 5
reverse [1, 2, 3]         -- [3, 2, 1]
take 3 [1, 2, 3, 4, 5]   -- [1, 2, 3]
drop 3 [1, 2, 3, 4, 5]   -- [4, 5]
elem 3 [1, 2, 3, 4, 5]   -- True         (is 3 a member?)
sum [1, 2, 3, 4, 5]       -- 15
product [1, 2, 3, 4, 5]   -- 120

Ranges

Haskell can generate sequences with range notation, which is shorthand for an enumerated list:

[1..10]          -- [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[2, 4..20]       -- [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]  (step inferred = 2)
['a'..'z']       -- "abcdefghijklmnopqrstuvwxyz"
[10, 9..1]       -- [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]        (descending)

When two starting values are given, Haskell infers the step from their difference, so [2, 4..20] counts in twos and [10, 9..1] counts down.

List Comprehensions — Traced

A list comprehension builds a new list by drawing values from one or more existing lists (generators), optionally keeping only those satisfying a condition (guards), and applying an expression to each. The general shape is:

[ expression | generator(s), guard(s) ]

A single generator with a guard, traced

Consider "the squares of the even numbers from 1 to 10":

[ x^2 | x <- [1..10], even x ]

Read it as: draw each x from [1..10]; keep only those where even x; output x^2. Tracing the evaluation element by element:

  x = 1   even 1  = False  -> discarded
  x = 2   even 2  = True   -> 2^2  = 4
  x = 3   even 3  = False  -> discarded
  x = 4   even 4  = True   -> 4^2  = 16
  x = 5   even 5  = False  -> discarded
  x = 6   even 6  = True   -> 6^2  = 36
  x = 7   even 7  = False  -> discarded
  x = 8   even 8  = True   -> 8^2  = 64
  x = 9   even 9  = False  -> discarded
  x = 10  even 10 = True   -> 10^2 = 100
  => [4, 16, 36, 64, 100]

The guard even x plays the role of filter and the expression x^2 plays the role of map, so a comprehension neatly fuses the two into one piece of notation.

Multiple generators

With two generators, Haskell iterates the rightmost generator fastest, like a nested loop:

[ (x, y) | x <- [1..3], y <- [1..3], x + y == 4 ]
-- x=1: y=1(2) y=2(3) y=3(4 keep -> (1,3))
-- x=2: y=1(3) y=2(4 keep -> (2,2)) y=3(5)
-- x=3: y=1(4 keep -> (3,1)) y=2(5) y=3(6)
-- => [(1,3), (2,2), (3,1)]

A transformation over a string

Because a String is a list of Char, comprehensions work on text too:

import Data.Char (toUpper)
[ toUpper c | c <- "hello world" ]   -- "HELLO WORLD"

The same idea in Python, for contrast

# Python comprehensions borrow the same notation
squares_of_evens = [x**2 for x in range(1, 11) if x % 2 == 0]
# [4, 16, 36, 64, 100]

Pattern Matching and Recursive List Processing

Because a list is either [] or (x:xs), you can write functions by pattern matching these two shapes — the [] case is the recursion's base case and the (x:xs) case does work on the head and recurses on the tail.

describe :: [Int] -> String
describe []     = "empty list"
describe [x]    = "one element: " ++ show x
describe (x:xs) = "first is " ++ show x ++ " with " ++ show (length xs) ++ " more"

The pattern (x:xs) destructures a non-empty list, binding x to the head and xs to the tail in one step. Here are the three canonical traversals, each following the same two-case skeleton:

-- sum: base case 0; otherwise add the head to the sum of the tail
mySum :: [Int] -> Int
mySum []     = 0
mySum (x:xs) = x + mySum xs

-- map: apply f to the head, recurse on the tail
myMap :: (a -> b) -> [a] -> [b]
myMap _ []     = []
myMap f (x:xs) = f x : myMap f xs

-- filter: keep the head only if the predicate holds
myFilter :: (a -> Bool) -> [a] -> [a]
myFilter _ []     = []
myFilter p (x:xs)
    | p x       = x : myFilter p xs
    | otherwise = myFilter p xs

Tracing `mySum [4, 2, 7]` by substitution

Each line replaces the previous by the relevant defining equation — no mutation, no loop counter:

  mySum [4, 2, 7]
= 4 + mySum [2, 7]            -- (x:xs) case, x = 4
= 4 + (2 + mySum [7])        -- x = 2
= 4 + (2 + (7 + mySum []))   -- x = 7
= 4 + (2 + (7 + 0))          -- [] base case returns 0
= 4 + (2 + 7)
= 4 + 9
= 13

Notice the recursion winds down to the empty list, then the additions resolve back up. That down-then-up shape is the signature of structural recursion over a list, and it is exactly what myMap and myFilter do as well — only the per-element work differs.

Folding a List: `foldr` and `foldl`

The three traversals above (mySum, myMap, myFilter) all share a skeleton: combine the head with the recursively-processed tail. Folding captures that skeleton once and parameterises the combining step, which is why fold is the most general list-processing function. A right fold inserts a binary operator between every element and a starting value, associating to the right:

\texttt{foldr}\;(\oplus)\;z\;[a, b, c] \;=\; a \oplus (b \oplus (c \oplus z))

foldr (+) 0 [1, 2, 3]    -- 1 + (2 + (3 + 0)) = 6
foldr (:) [] [1, 2, 3]   -- 1 : (2 : (3 : [])) = [1, 2, 3]  (rebuilds the list)

A left fold associates the other way, accumulating from the left:

\texttt{foldl}\;(\oplus)\;z\;[a, b, c] \;=\; ((z \oplus a) \oplus b) \oplus c

foldl (+) 0 [1, 2, 3]    -- ((0 + 1) + 2) + 3 = 6

For a commutative and associative operator such as (+) the two give the same answer, but for others they differ — foldl (-) 0 [1,2,3] is ((0-1)-2)-3 = -6 whereas foldr (-) 0 [1,2,3] is 1-(2-(3-0)) = 2. Recognising that sum, product, length, reverse and even map and filter can all be written as folds is exactly the abstraction that makes fold the basis of the reduce step in big-data MapReduce (§4.11). Trace foldr (+) 0 [4, 2] to see the structure:

  foldr (+) 0 [4, 2]
= 4 + foldr (+) 0 [2]
= 4 + (2 + foldr (+) 0 [])
= 4 + (2 + 0)
= 6

Lists of Pairs as Lookup Tables

A common, examinable use of lists is the association list — a list of (key, value) pairs used as a simple lookup table. The library function lookup searches it, returning a Maybe value (Just v if found, Nothing if not) so that a missing key is handled safely rather than crashing:

prices :: [(String, Int)]
prices = [("apple", 30), ("banana", 20), ("cherry", 75)]

lookup "banana" prices    -- Just 20
lookup "mango"  prices    -- Nothing

Lists and List Processing

Lists and List Processing

Spec Mapping

Lists Are Homogeneous Sequences

The Recursive Structure: Cons and the Empty List

Decomposing with head, tail and null

Cons (:) versus append (++): the cost difference

Other common list functions

Ranges

List Comprehensions — Traced

A single generator with a guard, traced

Multiple generators

A transformation over a string

The same idea in Python, for contrast

Pattern Matching and Recursive List Processing

Tracing mySum [4, 2, 7] by substitution

Folding a List: foldr and foldl

Lists of Pairs as Lookup Tables

More in Computer Science

Tracing `mySum [4, 2, 7]` by substitution

Folding a List: `foldr` and `foldl`