File Processing and Streams

This lesson covers file processing — reading from, writing to and appending to files — together with the idea of a stream as the abstraction that carries data between a program and the outside world. Variables live in RAM and vanish the instant a program ends; files give a program persistence, so a saved game, a register of student marks or an application log can outlive the process that created it. File handling is also where program logic meets the messy real world of missing files, locked resources and partially written data, so it ties tightly into exception handling and defensive programming.

Spec Mapping

This lesson addresses file handling within the Fundamentals of programming section of the AQA A-Level Computer Science (7517) specification (subject content area 4.1.1, with strong links to the writing of robust programs in 4.1.3). It covers: the difference between volatile memory and persistent file storage; the standard operations open, read, write, append and close; the role of file modes; reading and writing text files line by line; the distinction between text files and binary files; serialisation of structured records to and from a file; and the use of streams as the underlying model for sequential input and output. You are expected to write code that reads and writes files, explain why files must be closed, and evaluate the robustness of file-handling code under error conditions.

Why files? Volatile versus persistent storage

Data held in a variable is volatile — it occupies main memory (RAM) and is lost the moment the program terminates or the power is removed. A file is persistent: it is held on secondary storage (a disk or SSD) and survives between runs of a program, between reboots, and can be copied, archived or moved to another machine.

Storage	Volatile?	Speed	Typical use
Variables (RAM)	Yes — lost when the program ends	Very fast	Working data during execution
Files (secondary storage)	No — persists after the program ends	Slower	Saved documents, logs, configuration, datasets
Database	No — persists and supports structured queries	Slower, but indexed	Many concurrent users, relational data (see §4.10)

Files therefore let a program save state for later, share data with other programs, process large datasets in batches and keep an audit trail. The trade-off is speed: reading from disk is orders of magnitude slower than reading from RAM, which is exactly why programs load data into variables to work on it and only write back to file when necessary.

The five core operations and file modes

Every file-handling task is built from five primitive operations.

Operation	What it does
Open	Establishes a connection (a file handle) between the program and the file and positions a file pointer.
Read	Copies data from the file into the program.
Write	Copies data from the program into the file, overwriting existing contents.
Append	Copies data to the end of the file, preserving what is already there.
Close	Flushes any buffered data to disk and releases the file handle.

The behaviour of open depends on the mode requested:

Mode	Meaning	If the file is missing	Existing contents
`"r"`	Read	Raises `FileNotFoundError`	Preserved
`"w"`	Write	Created	Erased before writing
`"a"`	Append	Created	Preserved; new data added at the end
`"r+"`	Read and write	Raises `FileNotFoundError`	Preserved
`"rb"`/`"wb"`	Binary read / write	As `r`/`w`	As `r`/`w`

The single most dangerous detail here is that "w" truncates the file to zero length as soon as it is opened — before you have written anything. Opening an important file in "w" mode "just to check" will destroy its contents.

The file pointer

When a file is opened, the operating system maintains a file pointer (sometimes called a cursor) that marks the current position. Each read or write advances the pointer past the data just processed. This is why a sequence of readLine() calls returns successive lines rather than the same line repeatedly, and why you cannot read a line twice without explicitly moving the pointer back.

flowchart LR
    A["Open file
pointer at start"] --> B["read first line
pointer advances"]
    B --> C["read second line
pointer advances"]
    C --> D["...
end of file reached"]
    D --> E["close
pointer released"]

Understanding the pointer explains a common bug: code that reads the whole file with read() and then tries to loop over the same file object gets nothing, because the pointer is already at the end.

Reading text files

A text file is read sequentially from the current pointer position. The classic pattern is a loop that continues until the end of file (EOF) is reached.

AQA-style pseudocode

# AQA-style pseudocode
file = OPEN("students.txt", "r")
WHILE NOT EOF(file)
    line = READLINE(file)
    OUTPUT line
END WHILE
CLOSE(file)

Python

# Idiomatic: iterate directly over the file object, line by line
with open("students.txt", "r") as file:
    for line in file:
        print(line.strip())   # strip() removes the trailing newline

# Read the entire file into one string (small files only)
with open("students.txt", "r") as file:
    content = file.read()

# Read all lines into a list
with open("students.txt", "r") as file:
    lines = file.readlines()

Iterating directly over the file object is preferred for large files because it reads one line at a time and never holds the whole file in memory — a direct application of the streaming idea developed later in this lesson.

The `with` statement and why closing matters

Forgetting to close a file is one of the most common — and most damaging — mistakes in file handling. An unclosed file may:

lose data, because writes are buffered in memory and only flushed to disk on close;
leave a lock on the file so other programs cannot open it;
leak a file handle, exhausting a limited operating-system resource if it happens in a loop.

Python's with statement (a context manager) closes the file automatically when the block ends — even if an exception is raised inside it.

# Manual: you MUST remember to close, and close runs even after an error
file = open("data.txt", "r")
try:
    content = file.read()
finally:
    file.close()        # guaranteed by finally

# Context manager: close is guaranteed, with no boilerplate
with open("data.txt", "r") as file:
    content = file.read()
# file is closed here, whatever happened inside the block

Exam Tip: Any answer that opens a file should also close it. If you use with, say explicitly that it closes the file automatically; this earns the mark and shows you understand why closing matters (flushing buffered data and releasing the handle).

Writing and appending

# WRITE mode: erases the file first, then writes
with open("results.txt", "w") as file:
    file.write("Name,Score\n")     # \n is needed; write() does NOT add one
    file.write("Alice,85\n")
    file.write("Bob,72\n")

# APPEND mode: keeps existing contents, adds to the end
with open("log.txt", "a") as file:
    file.write("2026-06-01 login OK\n")

Two details catch students out. First, write() does not add a newline, so each record needs an explicit \n. Second, the choice between "w" and "a" is the difference between replacing a log and extending it — using "w" for a log file would silently discard every previous entry on each run.

Records and serialisation: writing structured data

Real programs rarely store single strings; they store records — structured collections of fields, such as a student with a name, age and mark. To put a record into a flat file it must be serialised: converted into a sequence of characters or bytes. Reading it back deserialises it, reconstructing the in-memory structure.

The simplest serialisation is a delimited text line. CSV (Comma-Separated Values) is the canonical example: one record per line, fields separated by commas.

Name	Age	Score
Alice	17	85
Bob	18	72
Charlie	17	91

import csv

# Serialise records (dictionaries) to a CSV file
students = [
    {"Name": "Alice", "Age": 17, "Score": 85},
    {"Name": "Bob",   "Age": 18, "Score": 72},
]
with open("students.csv", "w", newline="") as file:
    writer = csv.DictWriter(file, fieldnames=["Name", "Age", "Score"])
    writer.writeheader()
    for record in students:
        writer.writerow(record)

# Deserialise the CSV file back into records, converting field types
loaded = []
with open("students.csv", "r") as file:
    reader = csv.DictReader(file)
    for row in reader:
        loaded.append({
            "Name": row["Name"],
            "Age": int(row["Age"]),     # everything read from a text file is a string
            "Score": int(row["Score"]), # so numeric fields must be converted
        })

The key conceptual point — and a frequent exam discriminator — is that everything read from a text file arrives as a string. The number 85 written to file becomes the characters "8" and "5"; to do arithmetic with it you must convert with int(). Forgetting this leads to bugs such as "72" + "85" producing the string "7285" instead of 157.

Text files versus binary files

Feature	Text file	Binary file
Stored as	Encoded characters (e.g. UTF-8)	Raw bytes, exactly as in memory
Human-readable?	Yes, in any text editor	No — needs a program that understands the format
Examples	.txt, .csv, .html, .py	.jpg, .png, .exe, .dat
Line endings	Translated by the OS (`\n` vs `\r\n`)	No translation
Number storage	Each digit stored as a character (so 85 = 2 bytes)	Stored compactly (an integer may be 4 bytes regardless of value)
Typical use	Configuration, logs, data interchange	Images, audio, compiled programs, fast records

# Binary write: a list of byte values (here, the ASCII codes for "Hello")
with open("data.bin", "wb") as file:
    file.write(bytes([72, 101, 108, 108, 111]))

# Binary read: bytes come back as integers
with open("data.bin", "rb") as file:
    data = file.read()
    print(list(data))   # [72, 101, 108, 108, 111]

Binary files are typically more compact (a four-byte integer stores 4,000,000,000 in the same space as it stores 7) and faster to process, but they are not portable across editors and require code that knows the exact layout. Text files trade compactness for readability and interoperability.

Robust file handling with exceptions

File operations interact with the operating system and the physical disk, so they fail in ways that pure in-memory code does not: the file may be missing, locked, read-only, or the disk may be full. Robust code anticipates this.

try:
    with open("data.txt", "r") as file:
        content = file.read()
        print(content)
except FileNotFoundError:
    print("Error: the file does not exist.")
except PermissionError:
    print("Error: you do not have permission to read this file.")
except OSError as error:
    print(f"Unexpected file error: {error}")

Note the ordering: specific exceptions (FileNotFoundError, PermissionError) are caught before the more general OSError, because both of those are subclasses of OSError. A general handler placed first would intercept everything and the specific handlers would be unreachable — a direct link to the exception-hierarchy ideas from the exception-handling lesson.

Streams: the underlying abstraction

A stream is an abstraction representing a sequence of data elements made available over time, accessed in order. Rather than thinking of a file as a fixed block, a stream lets a program consume or produce data piece by piece, without knowing in advance how much there is.

Stream	Direction	Examples
Input stream	Data flows into the program	Reading a file, keyboard input, a network socket
Output stream	Data flows out of the program	Writing a file, printing to screen, sending to a printer

The power of the stream model is uniformity: the same read/write interface works whether the source is a file on disk, a keyboard or a remote server. In Python, a file object is a stream, which is why iterating over a file feels identical to iterating over any other sequence. Streaming also enables processing of data larger than memory — you can sum a billion numbers in a file by reading one line at a time, never holding more than a single value at once.

Worked example: a register that survives restarts

The following module loads a register from a CSV file at start-up, lets the program add marks in memory, and saves the register back on exit — a complete persistence cycle.

import csv

def load_register(filename: str) -> list:
    register = []
    try:
        with open(filename, "r") as file:
            reader = csv.DictReader(file)
            for row in reader:
                register.append({
                    "name": row["Name"],
                    "score": int(row["Score"]),
                })
    except FileNotFoundError:
        # First run: no file yet, so start with an empty register
        print(f"No existing register at {filename}; starting fresh.")
    return register

def save_register(filename: str, register: list) -> None:
    with open(filename, "w", newline="") as file:
        writer = csv.DictWriter(file, fieldnames=["Name", "Score"])
        writer.writeheader()
        for record in register:
            writer.writerow({"Name": record["name"], "Score": record["score"]})

def class_average(register: list) -> float:
    if not register:
        return 0.0
    return sum(record["score"] for record in register) / len(register)

# Persistence cycle
register = load_register("class.csv")          # deserialise from disk
register.append({"name": "Dev", "score": 88})  # modify in memory
print(f"Average: {class_average(register):.1f}")
save_register("class.csv", register)           # serialise back to disk

The load_register function treats a missing file as the normal first-run case rather than an error — a good example of judging which exceptions are genuinely exceptional.

Tracing a file-processing algorithm

Exam questions frequently ask you to trace a small file-processing routine and state its output. Consider an algorithm that reads a file of numbers — one per line — and reports the count and the total.

# AQA-style pseudocode
file = OPEN("scores.txt", "r")
count = 0
total = 0
WHILE NOT EOF(file)
    line = READLINE(file)
    value = INT(line)
    count = count + 1
    total = total + value
ENDWHILE
CLOSE(file)
OUTPUT "Count = " + STR(count)
OUTPUT "Total = " + STR(total)

Suppose scores.txt contains the three lines 40, 55, 30. The trace table records the state after each iteration of the loop:

File Processing and Streams

File Processing and Streams

Spec Mapping

Why files? Volatile versus persistent storage

The five core operations and file modes

The file pointer

Reading text files

AQA-style pseudocode

Python

The with statement and why closing matters

Writing and appending

Records and serialisation: writing structured data

Text files versus binary files

Robust file handling with exceptions

Streams: the underlying abstraction

Worked example: a register that survives restarts

Tracing a file-processing algorithm

More in Computer Science

The `with` statement and why closing matters