You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Almost every useful program needs to remember something after it stops running: a saved game, a list of customers, a log of what happened, a high-score table. The data a program holds in variables lives in RAM and is volatile — switch the program off and it is gone. File handling is how a program escapes that limitation, writing data to persistent storage (a disk or SSD) so it survives between runs and can be read back later. This lesson works through the whole life cycle of a file: opening it in the right mode, reading from it (whole-file, line-by-line, and into a list), writing and appending to it, and closing it cleanly so nothing is lost. It then contrasts text files with binary files, and serial/sequential access with random/direct access, ending with two full worked programs in ```python.
A theme runs through everything below: a file is a stream of bytes plus a movable pointer. When you open a file the pointer sits at the start; every read or write advances it; seek jumps it somewhere new. Hold that picture in mind and almost every file operation — including the difference between appending and overwriting, and between serial and direct access — becomes obvious rather than memorised. The OCR pseudocode reference uses openRead, openWrite, readLine, writeLine, endOfFile and close; the Python below mirrors those ideas so you can move between the two notations in the exam.
Within H446 2.2.1 this lesson covers the file-handling techniques every programmer needs. You should be able to:
seek/tell) for direct access;These points paraphrase the specification; nothing is quoted verbatim.
| Storage | Volatile? | Speed | Typical use |
|---|---|---|---|
| Variables / arrays (RAM) | Yes — lost on exit | Very fast | Working data during a run |
| Files (disk / SSD) | No — persists | Slower | Saved data, logs, config, exports |
A program that asks the user for fifty names and then exits has thrown them all away. Writing them to a file means the next run can read them back. This is the same persistence problem solved at larger scale by databases (covered in the Databases course) — a file is the simplest possible form of secondary storage, and understanding files is the foundation for understanding why databases add structure, indexing and concurrent access on top.
When you open a file you choose a mode — a short string saying what you intend to do. The mode decides three things: may you read, may you write, and where the pointer starts.
| Mode | Meaning | If file is missing | Effect on existing data | Pointer starts at |
|---|---|---|---|---|
"r" | read only | raises FileNotFoundError | unchanged | start |
"w" | write — overwrite | creates it | erased (truncated to empty) | start |
"a" | append — add to end | creates it | preserved | end |
"r+" | read and write | error | preserved (overwrite at position) | start |
"rb" / "wb" | read / write binary | as r / w | as r / w | start |
"x" | exclusive create | creates it | error if it already exists | start |
The single most important — and most dangerous — distinction is "w" versus "a". Opening an existing file in "w" mode silently destroys its contents the instant it is opened, before you write a single byte. If you mean to add to a log, you want "a"; reach for "w" there and you wipe the history. The exam loves this trap.
with StatementA file must be opened before use and closed afterwards. Closing matters for two reasons: it flushes any buffered data to disk (writes are often buffered, so un-closed data can be lost), and it releases the operating-system handle so other programs can use the file.
# The verbose, error-prone way — you MUST remember to close
file = open("notes.txt", "r")
content = file.read()
file.close() # if an error occurs before this line, the file stays open
The robust, idiomatic way uses a with block (a context manager). The file is closed automatically when the block ends — even if an exception is thrown inside it:
# The 'with' statement guarantees the file is closed, error or not
with open("notes.txt", "r") as file:
content = file.read()
# <- file is automatically closed here, exception or not
# OCR-style pseudocode shows open ... process ... close explicitly
myFile = openRead("notes.txt")
contents = myFile.readLine()
myFile.close()
Exam Tip: In Python answers, prefer
with open(...) as f:— it shows you understand guaranteed closing. In pseudocode answers you must writeclose()explicitly; forgetting it loses an easy mark and is a genuine cause of lost data.
There are three common reading patterns. Choose by how much of the file you need at once.
with open("students.txt", "r") as file:
content = file.read() # one big string, including newline characters
print(content)
Iterating over the file object yields one line at a time, so only one line is ever in memory — essential for files too large to load whole. Each line still carries its trailing newline, so strip() is used to remove it.
with open("students.txt", "r") as file:
for line in file: # reads lazily, one line per loop
print(line.strip()) # strip() removes the trailing '\n'
with open("students.txt", "r") as file:
lines = file.readlines() # list of strings, one per line
print(f"The file has {len(lines)} lines")
OCR pseudocode often loops until end of file. Here is the equivalent idea, traced. Suppose scores.txt contains the three lines 70, 85, 60:
# Equivalent of: WHILE NOT endOfFile() ... readLine() ... ENDWHILE
total = 0
count = 0
with open("scores.txt", "r") as file:
for line in file:
mark = int(line.strip()) # convert text "70" to the integer 70
total = total + mark
count = count + 1
average = total / count
print(f"Average over {count} marks = {average}") # Average over 3 marks = 71.666...
| Step | line read | mark | total | count |
|---|---|---|---|---|
| start | — | — | 0 | 0 |
| iteration 1 | "70\n" | 70 | 70 | 1 |
| iteration 2 | "85\n" | 85 | 155 | 2 |
| iteration 3 | "60\n" | 60 | 215 | 3 |
| end of file | — | — | 215 | 3 → average 71.67 |
The crucial detail is the type conversion: a line read from a text file is always a string, even when it looks like a number. int(line.strip()) is doing two jobs — removing the newline and converting "70" to 70 so arithmetic works.
"w"with open("output.txt", "w") as file:
file.write("Hello, World!\n") # write() does NOT add a newline for you
file.write("This is line 2\n") # so include '\n' yourself
Note that write does not append a newline automatically (unlike print). If you forget the \n, both strings end up on the same line.
"a"with open("log.txt", "a") as file: # 'a' keeps existing content, pointer at end
file.write("2026-05-30 09:14 user logged in\n")
A common task is writing several records, one per line, building each line by joining fields:
students = [("Alice", 17, "A"), ("Bob", 18, "B"), ("Charlie", 17, "C")]
with open("students.txt", "w") as file:
for name, age, grade in students:
# build one CSV-style record per line and terminate it
file.write(f"{name},{age},{grade}\n")
# OCR-style pseudocode for the same idea
myFile = openWrite("students.txt")
myFile.writeLine("Alice,17,A")
myFile.writeLine("Bob,18,B")
myFile.close()
Exam Tip: Decide append or overwrite before you write a single line, and say which you chose and why. Logs and audit trails → append. A fresh export that should replace the old one → overwrite.
A CSV (Comma-Separated Values) file is just a text file with a convention: each line is one record, fields separated by commas. Python's csv module handles the fiddly cases (commas inside quoted fields, etc.) so you do not have to split by hand.
| Name | Age | Grade |
|---|---|---|
| Alice | 17 | A |
| Bob | 18 | B |
import csv
with open("students.csv", "r") as file:
reader = csv.reader(file)
header = next(reader) # consume the header row first
for row in reader: # row is a list of strings, e.g. ["Alice", "17", "A"]
name = row[0]
age = int(row[1]) # still must convert numbers from text
grade = row[2]
print(f"{name} (age {age}): Grade {grade}")
import csv
students = [["Alice", 17, "A"], ["Bob", 18, "B"], ["Charlie", 17, "C"]]
with open("students.csv", "w", newline="") as file: # newline="" avoids blank lines on Windows
writer = csv.writer(file)
writer.writerow(["Name", "Age", "Grade"]) # header
writer.writerows(students) # all data rows at once
DictReader / DictWriter — access fields by nameimport csv
with open("students.csv", "r") as file:
reader = csv.DictReader(file) # uses the header row as keys
for row in reader:
print(f"{row['Name']}: {row['Grade']}") # access by column name, not index
Accessing fields by name rather than numeric index makes the code far more readable and resilient — adding a column no longer shifts every index. This is conceptually the same move that databases make with named fields, and a CSV is often the bridge format used to export data from, or import data into, a database.
A binary file stores raw bytes rather than human-readable characters. Images (.png), audio (.mp3), compiled programs and serialised objects are all binary. You open them with a b in the mode ("rb", "wb").
# Writing raw bytes
data = bytes([72, 101, 108, 108, 111]) # the ASCII codes for H, e, l, l, o
with open("data.bin", "wb") as file:
file.write(data)
# Reading raw bytes back
with open("data.bin", "rb") as file:
content = file.read()
print(list(content)) # [72, 101, 108, 108, 111]
| Feature | Text file | Binary file |
|---|---|---|
| Stores | Characters (encoded, e.g. UTF-8/ASCII) | Raw bytes, no encoding layer |
| Human-readable? | Yes, in any editor | No — needs the right software |
| Newlines | May be translated by the OS (\n ↔ \r\n) | Never translated |
| Size | Often larger (digits stored as characters) | Compact (a number stored in its raw form) |
| Typical use | Logs, config, CSV, JSON, source code | Images, audio, video, executables, fixed-length records |
A subtle but examinable point: storing the integer 1000000 in a text file takes 7 bytes (one per digit character), but in a binary file it can fit in 4 bytes (a 32-bit integer). Binary is more compact and faster to parse, at the cost of not being readable in a text editor. This connects to the Representation of Data topic — text files add an encoding layer (ASCII/Unicode) that binary files skip.
This is a key conceptual distinction the specification names explicitly.
| Serial / sequential access | Random / direct access | |
|---|---|---|
| How records are reached | Read in order from the start until you reach the one you want | Jump straight to a position using the pointer |
| Underlying picture | A cassette tape — fast-forward through everything | A music track number — go straight to it |
| Cost of reaching record n | Proportional to n (must pass the earlier ones) | Roughly constant (one seek) |
| Typical with | Text/CSV files read line-by-line; magnetic tape backups | Fixed-length records in a binary file; database indexes |
A note on terms: serial and sequential are used almost interchangeably at this level (strictly, serial = stored end-to-end in arrival order; sequential = stored sorted on a key — both are read in order). Random and direct access mean the same thing: reach any record without reading the ones before it.
# To reach "line 50" you must read lines 1..49 first — that is sequential access
with open("large_file.txt", "r") as file:
for i, line in enumerate(file):
if i == 49: # the 50th line (0-indexed)
print(line.strip())
break
seek(offset) moves the pointer to a byte position; tell() reports where it is; read(n) reads n bytes from there.
with open("data.bin", "rb") as file:
file.seek(100) # jump straight to byte 100 — no reading of 0..99
chunk = file.read(10) # read 10 bytes from position 100
print(file.tell()) # 110 — the pointer has advanced by 10
Direct access works cleanly when every record is the same length. If each record is exactly 32 bytes, record number r begins at byte r * 32, so the program can compute the offset and seek straight there:
RECORD_SIZE = 32 # every record is exactly 32 bytes
def read_record(filename: str, record_number: int) -> bytes:
with open(filename, "rb") as file:
file.seek(record_number * RECORD_SIZE) # compute the offset directly
return file.read(RECORD_SIZE)
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.