You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
This lesson covers file processing — reading from, writing to and appending to files — together with the idea of a stream as the abstraction that carries data between a program and the outside world. Variables live in RAM and vanish the instant a program ends; files give a program persistence, so a saved game, a register of student marks or an application log can outlive the process that created it. File handling is also where program logic meets the messy real world of missing files, locked resources and partially written data, so it ties tightly into exception handling and defensive programming.
This lesson addresses file handling within the Fundamentals of programming section of the AQA A-Level Computer Science (7517) specification (subject content area 4.1.1, with strong links to the writing of robust programs in 4.1.3). It covers: the difference between volatile memory and persistent file storage; the standard operations open, read, write, append and close; the role of file modes; reading and writing text files line by line; the distinction between text files and binary files; serialisation of structured records to and from a file; and the use of streams as the underlying model for sequential input and output. You are expected to write code that reads and writes files, explain why files must be closed, and evaluate the robustness of file-handling code under error conditions.
Data held in a variable is volatile — it occupies main memory (RAM) and is lost the moment the program terminates or the power is removed. A file is persistent: it is held on secondary storage (a disk or SSD) and survives between runs of a program, between reboots, and can be copied, archived or moved to another machine.
| Storage | Volatile? | Speed | Typical use |
|---|---|---|---|
| Variables (RAM) | Yes — lost when the program ends | Very fast | Working data during execution |
| Files (secondary storage) | No — persists after the program ends | Slower | Saved documents, logs, configuration, datasets |
| Database | No — persists and supports structured queries | Slower, but indexed | Many concurrent users, relational data (see §4.10) |
Files therefore let a program save state for later, share data with other programs, process large datasets in batches and keep an audit trail. The trade-off is speed: reading from disk is orders of magnitude slower than reading from RAM, which is exactly why programs load data into variables to work on it and only write back to file when necessary.
Every file-handling task is built from five primitive operations.
| Operation | What it does |
|---|---|
| Open | Establishes a connection (a file handle) between the program and the file and positions a file pointer. |
| Read | Copies data from the file into the program. |
| Write | Copies data from the program into the file, overwriting existing contents. |
| Append | Copies data to the end of the file, preserving what is already there. |
| Close | Flushes any buffered data to disk and releases the file handle. |
The behaviour of open depends on the mode requested:
| Mode | Meaning | If the file is missing | Existing contents |
|---|---|---|---|
"r" | Read | Raises FileNotFoundError | Preserved |
"w" | Write | Created | Erased before writing |
"a" | Append | Created | Preserved; new data added at the end |
"r+" | Read and write | Raises FileNotFoundError | Preserved |
"rb"/"wb" | Binary read / write | As r/w | As r/w |
The single most dangerous detail here is that "w" truncates the file to zero length as soon as it is opened — before you have written anything. Opening an important file in "w" mode "just to check" will destroy its contents.
When a file is opened, the operating system maintains a file pointer (sometimes called a cursor) that marks the current position. Each read or write advances the pointer past the data just processed. This is why a sequence of readLine() calls returns successive lines rather than the same line repeatedly, and why you cannot read a line twice without explicitly moving the pointer back.
flowchart LR
A["Open file
pointer at start"] --> B["read first line
pointer advances"]
B --> C["read second line
pointer advances"]
C --> D["...
end of file reached"]
D --> E["close
pointer released"]
Understanding the pointer explains a common bug: code that reads the whole file with read() and then tries to loop over the same file object gets nothing, because the pointer is already at the end.
A text file is read sequentially from the current pointer position. The classic pattern is a loop that continues until the end of file (EOF) is reached.
# AQA-style pseudocode
file = OPEN("students.txt", "r")
WHILE NOT EOF(file)
line = READLINE(file)
OUTPUT line
END WHILE
CLOSE(file)
# Idiomatic: iterate directly over the file object, line by line
with open("students.txt", "r") as file:
for line in file:
print(line.strip()) # strip() removes the trailing newline
# Read the entire file into one string (small files only)
with open("students.txt", "r") as file:
content = file.read()
# Read all lines into a list
with open("students.txt", "r") as file:
lines = file.readlines()
Iterating directly over the file object is preferred for large files because it reads one line at a time and never holds the whole file in memory — a direct application of the streaming idea developed later in this lesson.
with statement and why closing mattersForgetting to close a file is one of the most common — and most damaging — mistakes in file handling. An unclosed file may:
Python's with statement (a context manager) closes the file automatically when the block ends — even if an exception is raised inside it.
# Manual: you MUST remember to close, and close runs even after an error
file = open("data.txt", "r")
try:
content = file.read()
finally:
file.close() # guaranteed by finally
# Context manager: close is guaranteed, with no boilerplate
with open("data.txt", "r") as file:
content = file.read()
# file is closed here, whatever happened inside the block
Exam Tip: Any answer that opens a file should also close it. If you use
with, say explicitly that it closes the file automatically; this earns the mark and shows you understand why closing matters (flushing buffered data and releasing the handle).
# WRITE mode: erases the file first, then writes
with open("results.txt", "w") as file:
file.write("Name,Score\n") # \n is needed; write() does NOT add one
file.write("Alice,85\n")
file.write("Bob,72\n")
# APPEND mode: keeps existing contents, adds to the end
with open("log.txt", "a") as file:
file.write("2026-06-01 login OK\n")
Two details catch students out. First, write() does not add a newline, so each record needs an explicit \n. Second, the choice between "w" and "a" is the difference between replacing a log and extending it — using "w" for a log file would silently discard every previous entry on each run.
Real programs rarely store single strings; they store records — structured collections of fields, such as a student with a name, age and mark. To put a record into a flat file it must be serialised: converted into a sequence of characters or bytes. Reading it back deserialises it, reconstructing the in-memory structure.
The simplest serialisation is a delimited text line. CSV (Comma-Separated Values) is the canonical example: one record per line, fields separated by commas.
| Name | Age | Score |
|---|---|---|
| Alice | 17 | 85 |
| Bob | 18 | 72 |
| Charlie | 17 | 91 |
import csv
# Serialise records (dictionaries) to a CSV file
students = [
{"Name": "Alice", "Age": 17, "Score": 85},
{"Name": "Bob", "Age": 18, "Score": 72},
]
with open("students.csv", "w", newline="") as file:
writer = csv.DictWriter(file, fieldnames=["Name", "Age", "Score"])
writer.writeheader()
for record in students:
writer.writerow(record)
# Deserialise the CSV file back into records, converting field types
loaded = []
with open("students.csv", "r") as file:
reader = csv.DictReader(file)
for row in reader:
loaded.append({
"Name": row["Name"],
"Age": int(row["Age"]), # everything read from a text file is a string
"Score": int(row["Score"]), # so numeric fields must be converted
})
The key conceptual point — and a frequent exam discriminator — is that everything read from a text file arrives as a string. The number 85 written to file becomes the characters "8" and "5"; to do arithmetic with it you must convert with int(). Forgetting this leads to bugs such as "72" + "85" producing the string "7285" instead of 157.
| Feature | Text file | Binary file |
|---|---|---|
| Stored as | Encoded characters (e.g. UTF-8) | Raw bytes, exactly as in memory |
| Human-readable? | Yes, in any text editor | No — needs a program that understands the format |
| Examples | .txt, .csv, .html, .py | .jpg, .png, .exe, .dat |
| Line endings | Translated by the OS (\n vs \r\n) | No translation |
| Number storage | Each digit stored as a character (so 85 = 2 bytes) | Stored compactly (an integer may be 4 bytes regardless of value) |
| Typical use | Configuration, logs, data interchange | Images, audio, compiled programs, fast records |
# Binary write: a list of byte values (here, the ASCII codes for "Hello")
with open("data.bin", "wb") as file:
file.write(bytes([72, 101, 108, 108, 111]))
# Binary read: bytes come back as integers
with open("data.bin", "rb") as file:
data = file.read()
print(list(data)) # [72, 101, 108, 108, 111]
Binary files are typically more compact (a four-byte integer stores 4,000,000,000 in the same space as it stores 7) and faster to process, but they are not portable across editors and require code that knows the exact layout. Text files trade compactness for readability and interoperability.
File operations interact with the operating system and the physical disk, so they fail in ways that pure in-memory code does not: the file may be missing, locked, read-only, or the disk may be full. Robust code anticipates this.
try:
with open("data.txt", "r") as file:
content = file.read()
print(content)
except FileNotFoundError:
print("Error: the file does not exist.")
except PermissionError:
print("Error: you do not have permission to read this file.")
except OSError as error:
print(f"Unexpected file error: {error}")
Note the ordering: specific exceptions (FileNotFoundError, PermissionError) are caught before the more general OSError, because both of those are subclasses of OSError. A general handler placed first would intercept everything and the specific handlers would be unreachable — a direct link to the exception-hierarchy ideas from the exception-handling lesson.
A stream is an abstraction representing a sequence of data elements made available over time, accessed in order. Rather than thinking of a file as a fixed block, a stream lets a program consume or produce data piece by piece, without knowing in advance how much there is.
| Stream | Direction | Examples |
|---|---|---|
| Input stream | Data flows into the program | Reading a file, keyboard input, a network socket |
| Output stream | Data flows out of the program | Writing a file, printing to screen, sending to a printer |
The power of the stream model is uniformity: the same read/write interface works whether the source is a file on disk, a keyboard or a remote server. In Python, a file object is a stream, which is why iterating over a file feels identical to iterating over any other sequence. Streaming also enables processing of data larger than memory — you can sum a billion numbers in a file by reading one line at a time, never holding more than a single value at once.
The following module loads a register from a CSV file at start-up, lets the program add marks in memory, and saves the register back on exit — a complete persistence cycle.
import csv
def load_register(filename: str) -> list:
register = []
try:
with open(filename, "r") as file:
reader = csv.DictReader(file)
for row in reader:
register.append({
"name": row["Name"],
"score": int(row["Score"]),
})
except FileNotFoundError:
# First run: no file yet, so start with an empty register
print(f"No existing register at {filename}; starting fresh.")
return register
def save_register(filename: str, register: list) -> None:
with open(filename, "w", newline="") as file:
writer = csv.DictWriter(file, fieldnames=["Name", "Score"])
writer.writeheader()
for record in register:
writer.writerow({"Name": record["name"], "Score": record["score"]})
def class_average(register: list) -> float:
if not register:
return 0.0
return sum(record["score"] for record in register) / len(register)
# Persistence cycle
register = load_register("class.csv") # deserialise from disk
register.append({"name": "Dev", "score": 88}) # modify in memory
print(f"Average: {class_average(register):.1f}")
save_register("class.csv", register) # serialise back to disk
The load_register function treats a missing file as the normal first-run case rather than an error — a good example of judging which exceptions are genuinely exceptional.
Exam questions frequently ask you to trace a small file-processing routine and state its output. Consider an algorithm that reads a file of numbers — one per line — and reports the count and the total.
# AQA-style pseudocode
file = OPEN("scores.txt", "r")
count = 0
total = 0
WHILE NOT EOF(file)
line = READLINE(file)
value = INT(line)
count = count + 1
total = total + value
ENDWHILE
CLOSE(file)
OUTPUT "Count = " + STR(count)
OUTPUT "Total = " + STR(total)
Suppose scores.txt contains the three lines 40, 55, 30. The trace table records the state after each iteration of the loop:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.