You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Abstraction is the single most important idea in Computer Science — so important that the discipline is sometimes defined as the science of managing complexity through abstraction. To think abstractly is to deliberately ignore detail. When you draw a flowchart you ignore which programming language you will eventually use; when you call sort(myList) you ignore which sorting algorithm runs underneath; when you type a URL you ignore the dozens of routers your packets cross. In every case you keep what matters for the problem in front of you and discard everything else.
This lesson develops the H446 view of abstraction in two complementary directions. Representational abstraction is about modelling — deciding what features of a real situation to keep in a digital representation. Procedural abstraction (closely tied to abstraction by generalisation) is about behaviour — packaging a process so it can be used without knowing how it works. Alongside these you need information hiding (the mechanism that enforces abstraction) and levels of abstraction (the way large systems stack many abstractions on top of one another). By the end you should be able to look at any system and say precisely what has been abstracted away, and why that was the right thing to discard.
This lesson addresses H446 section 2.1.1 — Thinking Abstractly. The specification expects you to:
These ideas recur throughout the whole A-Level: abstract data types in data structures, the layered TCP/IP stack in networking, and the assembly→machine-code→hardware stack in processor architecture are all applications of the principles below.
Abstraction is the process of removing detail that is not relevant to the current problem, so that what remains can be reasoned about more easily. Two warnings follow immediately from that definition.
First, abstraction is not the same as simplification by approximation. A weather model that rounds every temperature to the nearest 10 °C is simplified, but it is not a good abstraction — it has discarded detail the problem genuinely needs. A good abstraction discards only the irrelevant. Deciding what is irrelevant is the skill.
Second, relevance is relative to the problem, not absolute. The colour of the wires inside a network cable is irrelevant when you are designing a routing algorithm, but vital when you are physically crimping an RJ45 plug. The same detail is essential at one level of abstraction and noise at another. This is why we speak of levels of abstraction rather than one true model.
| Property of a good abstraction | What it means |
|---|---|
| Captures the essentials | Everything the problem depends on is preserved |
| Discards the irrelevant | Detail the problem never consults is removed |
| Has a clear boundary | It is obvious what is "inside" and "outside" the model |
| Is fit for a stated purpose | It is good for a problem, not good in the abstract |
Everyday example — the London Underground map. Harry Beck's famous 1933 diagram is an abstraction tuned to a single problem: which trains do I take, and where do I change? It keeps station order, line colours and interchanges, and deliberately throws away geographic distance, compass direction and surface streets. It is a terrible map for walking between two stations that are close above ground but far apart on the diagram — because that was never the problem it was built for. The map is a perfect illustration that an abstraction is only "correct" relative to its purpose.
Representational abstraction is the act of building a model of something real by keeping only the features the problem needs. Every data structure in your programs is the result of a representational-abstraction decision.
Consider modelling a student in a school information system:
# A representational abstraction of a real student.
# We keep only the attributes the system actually consults.
class Student:
def __init__(self, student_id, name, year_group):
self.student_id = student_id # needed: unique key
self.name = name # needed: display, reports
self.year_group = year_group # needed: grouping, timetabling
self.grades = {} # needed: assessment
# NOT modelled: height, eye colour, bus route, favourite band —
# the system never reasons about these, so they are abstracted away.
The decision about which attributes to include is not arbitrary — it follows from the operations the system must support. The table below shows the same situation modelled differently for different problems, proving once again that representation depends on purpose.
| Real thing | Problem | Attributes kept | Attributes discarded |
|---|---|---|---|
| A person | School register | ID, name, year group, attendance | Height, hobbies, address detail |
| A person | NHS patient record | NHS number, allergies, conditions, GP | Exam grades, year group |
| A person | Online shop account | Email, delivery address, order history | Year group, blood type |
| A book | Library catalogue | ISBN, title, author, on-loan flag | Page colour, font, paper weight |
| A road network | Sat-nav routing | Junctions, road lengths, speed limits | Tarmac colour, roadside trees |
Key Term: Representational abstraction answers the question "what should my model contain?" The answer is always "exactly the features the problem consults — no more, no less."
There is a genuine trade-off hiding here. A model with too few attributes cannot answer the questions the system needs to ask — a sat-nav graph that omits one-way restrictions will route you the wrong way down a street. A model with too many attributes is bloated: it wastes storage, slows processing, and burdens every programmer who must understand it. The art of representational abstraction is to land exactly on the set of attributes the problem requires. When the requirements change, the right abstraction can change too: if the library later needs to charge fines, the Loan model suddenly does need a "date returned" field it did not need before. Abstractions are not chosen once and frozen; they evolve as the problem evolves, which is one reason information hiding is so valuable — it lets you extend a model's internals without breaking everything that uses it.
Abstraction by generalisation spots that several specific problems share a common structure and treats them as instances of a single, more general problem. Where representational abstraction asks "what do I keep?", generalisation asks "what do these cases have in common, so I can solve them all at once?"
The pay-off is enormous: one general solution replaces many special-case ones. Sorting is the classic example. Sorting integers, sorting surnames and sorting dates look different, but they are all the same problem — arrange items by a comparison rule — so a single generic sort solves all three:
# One generalised routine, parameterised by the comparison/key.
# The SAME algorithm sorts numbers, strings or dates.
def merge_sort(items, key=lambda x: x):
if len(items) <= 1:
return items
mid = len(items) // 2
left = merge_sort(items[:mid], key)
right = merge_sort(items[mid:], key)
return merge(left, right, key) # merge using key() to compare
# numbers: merge_sort([5, 2, 9, 1])
# surnames: merge_sort(["Patel", "Adams"], key=str.lower)
# dates: merge_sort(events, key=lambda e: e.date)
Generalisation underpins most of the powerful machinery you will meet later:
| Generalisation in CS | What is generalised |
|---|---|
| Object-oriented inheritance | A Vehicle superclass captures what Car, Bus and Lorry share |
| Abstract data types | A Stack is defined by behaviour (push/pop), independent of array or linked-list storage |
| Generic / templated algorithms | One sort works for any comparable type |
| Polymorphism | shape.area() calls the right method for whatever shape is supplied |
| Design patterns | The Observer pattern generalises "notify dependents when something changes" |
The discipline of generalisation is to spot the deepest shared structure, not the shallowest. A novice might write sortIntegers, sortStrings and sortDates because the data looks different; an expert sees that "compare two items and arrange them" is the real common core and writes one parameterised routine. The pay-off compounds: every bug fixed, every optimisation made, and every test written for the general routine benefits all of its uses at once. This is why generalisation is not merely tidy — it is cheaper over the life of a system, because there is one thing to maintain instead of three. The risk to balance is over-generalisation: forcing genuinely different cases into one abstraction can produce a tangle of special-case flags that is harder to follow than three honest separate routines. Good designers generalise where the commonality is real and resist it where it is forced.
Procedural abstraction hides how a process is performed behind a name. Once a process is wrapped in a subroutine, callers think only about what it does, never how. This is the behavioural counterpart of representational abstraction's modelling.
# Callers see the name and contract; the body is abstracted away.
def average(values):
"""Return the mean of a non-empty list of numbers."""
return sum(values) / len(values)
result = average(marks) # we think 'mean of marks', not 'sum then divide by length'
The crucial property is that the implementation can change without callers noticing. If a faster or more accurate averaging method is written tomorrow, every caller benefits automatically and no calling code changes — because callers only ever depended on the interface (a list in, a mean out), never the body. Procedural abstraction is therefore the engine that makes large programs maintainable, and it links directly to subroutines, parameters and the call stack you meet in the programming module.
Examiners frequently want you to distinguish the two flavours of abstraction, so hold the contrast clearly in mind. Representational abstraction is about state — how you store a slice of the world. Procedural abstraction is about behaviour — how you package a process. They are two answers to two different questions, and a complete design uses both.
| Representational abstraction | Procedural abstraction | |
|---|---|---|
| Question it answers | "What should my model contain?" | "How do I use a process without knowing how it works?" |
| Hides | Irrelevant attributes of a thing | The steps of a process |
| Produced by | Choosing fields/data structures | Wrapping code in a named subroutine |
| Typical artefact | A Student record; a sat-nav road graph | average(); sort(); borrow(book, member) |
| Linked A-Level idea | Data structures, ADTs, databases | Subroutines, parameters, the call stack |
A neat way to remember it: representational abstraction is a noun-level decision (which things and attributes), procedural abstraction is a verb-level decision (which actions). The noun analysis and verb analysis you will meet in Thinking Procedurally (2.1.3) are exactly the tools for finding each kind. In object-oriented design the two fuse: a class bundles representational abstraction (its fields) with procedural abstraction (its methods) inside one information-hidden unit, which is precisely why OOP is such a natural fit for managing complexity.
A file is one of the most successful abstractions ever invented, and it shows every idea in this lesson working at once. To a programmer, a file is simply a named sequence of bytes you can open, read, write and close:
with open("results.txt") as f: # 'open the named byte-stream'
for line in f: # 'read it in order'
process(line)
# we never think about: which sectors on the disk hold the data,
# whether it is an SSD or a spinning platter, the file-system's
# free-list, block size, or the wear-levelling firmware.
open, read, write and close are operations whose internals (locating blocks, moving disk heads, talking to the SSD controller) are completely hidden behind their names.open() call works unchanged on a hard disk, an SSD, a USB stick or a network drive.This single example is worth memorising: in an exam you can use "the file" to illustrate all four concepts in one coherent story, which is exactly what top-band answers do.
If abstraction is the intention (ignore irrelevant detail), information hiding is the enforcement (make the irrelevant detail inaccessible). A component exposes a small public interface and conceals its implementation; other components are prevented from depending on the hidden parts.
| Term | Meaning |
|---|---|
| Interface | The visible, agreed way to use a component (its public methods/signatures) |
| Implementation | The hidden internal data and code that actually does the work |
| Encapsulation | Bundling data with the methods that act on it, and restricting outside access |
class BankAccount:
def __init__(self):
self.__balance = 0 # '__' hides the field (name-mangled, not public)
def deposit(self, amount): # interface: callers use this...
if amount > 0:
self.__balance += amount
def get_balance(self):
return self.__balance # ...never touching __balance directly
Because __balance is hidden, no outside code can set the balance to a negative number directly; the only way to change it is through deposit, which can enforce rules. The benefit is not secrecy for its own sake — it is that the class can guarantee its own invariants. Information hiding gives four concrete wins:
| Benefit | Why it follows from hiding |
|---|---|
| Lower complexity | Users learn a small interface, not the whole implementation |
| Safe change | The hidden parts can be rewritten without breaking callers |
| Fewer bugs | Outside code cannot corrupt internal state it cannot reach |
| Independent teamwork | Teams agree on interfaces, then work behind them in parallel |
Concretely, imagine the BankAccount class above initially stores the balance as an integer number of pounds, and later the bank needs to handle pence. Because __balance is hidden, the team can change the internal representation to an integer number of pence — or a decimal type — and as long as deposit and get_balance keep the same signatures, not one line of the thousands of places that use accounts needs to change. Had the balance been a public field that other code read and wrote directly, that change would have rippled through the entire codebase. This is the deep reason information hiding matters: it keeps components loosely coupled, so a change in one place stays local instead of cascading. Loose coupling is one of the central goals of all good software design, and information hiding is the main tool for achieving it.
Large systems are built by stacking abstractions: each layer offers a clean interface to the layer above and hides the messy layer below. You can work productively at any single layer without understanding the others — which is the only reason a single human can ever build software at all.
The classic example is the computing stack. A spreadsheet user, a Python programmer and a chip designer are all "using a computer", but each sees a completely different machine:
flowchart TB
A["Application software<br/>(user sees: buttons, documents)"]
B["High-level language<br/>(programmer sees: variables, functions, objects)"]
C["Assembly language<br/>(sees: registers, mnemonics, addresses)"]
D["Machine code<br/>(sees: binary opcodes and operands)"]
E["Logic gates / hardware<br/>(sees: AND/OR/NOT, voltages, the clock)"]
F["Physics<br/>(sees: electrons, semiconductors, fields)"]
A --> B --> C --> D --> E --> F
| Level | Layer | What it shows you | What it hides |
|---|---|---|---|
| 5 | Application software | Menus, files, features | All code below it |
| 4 | High-level language | Variables, functions, objects | Registers and addresses |
| 3 | Assembly language | Mnemonics, registers | Binary encoding |
| 2 | Machine code | Binary opcodes | Gate-level circuitry |
| 1 | Logic gates | AND/OR/NOT, the clock | Transistor physics |
| 0 | Physics | Electron flow, fields | (the bottom) |
The same layering principle organises networks. The TCP/IP model lets an application send data without knowing anything about Ethernet frames or radio waves:
| Layer | Name | Concern | Examples |
|---|---|---|---|
| 4 | Application | What the data means | HTTP, SMTP, FTP |
| 3 | Transport | Reliable end-to-end delivery | TCP, UDP |
| 2 | Internet | Routing across networks | IP, ICMP |
| 1 | Network access | Physical transmission | Ethernet, Wi-Fi |
Each layer talks only to the layers immediately above and below it, through a defined interface. A web browser hands an HTTP request to TCP and is utterly indifferent to whether the bottom layer is fibre, copper or radio. That indifference is the abstraction working.
A third example you can deploy in exams is virtual memory. The operating system gives every running program the illusion of a large, private, contiguous block of memory addresses starting at zero. In reality the data is scattered across physical RAM and may even be paged out to disk. The program is abstracted away from the true physical layout: it uses logical addresses and the memory-management hardware silently translates them. This is representational abstraction (a clean private address space modelling messy shared physical memory), information hiding (a program cannot read another program's pages), and a level of abstraction (logical addresses sit above physical addresses) — all in one mechanism you also study in the software-systems module.
The general principle across all three examples — the computing stack, the network stack and virtual memory — is the same: define a clean interface, hide the complexity beneath it, and the layer above can be built by someone who never needs to understand the layer below. Without this, no operating system, no internet and no large application could ever have been built, because no single team could hold the whole thing in their heads.
Watch all four ideas operate together on one design.
Step 1 — Representational abstraction. Identify the entities and decide which attributes each problem needs:
| Entity | Kept (relevant) | Discarded (irrelevant) |
|---|---|---|
| Book | ISBN, title, author, genre, on-loan flag | Weight, page colour, font |
| Member | member ID, name, membership type | Height, shoe size |
| Loan | book ID, member ID, date out, due date | Weather on the day borrowed |
Step 2 — Generalisation. A staff member and a student member share most attributes, so generalise them into a single Member with a type field rather than writing two near-identical entities.
Step 3 — Procedural abstraction + information hiding. Provide operations borrow(book, member) and return_book(loan) whose internals (updating the on-loan flag, recording dates, checking limits) are hidden. Callers — say the front-desk screen — use the names and never touch the underlying tables directly.
Step 4 — Levels of abstraction. The front-desk UI sits on top of a "library operations" layer, which sits on top of a database layer. The librarian using the system sees none of the SQL beneath. Each layer hides the one below.
The result is a model small enough to reason about, general enough to avoid duplication, and layered enough that each part can change independently.
Question (9 marks). A company is building a ride-hailing app that matches passengers to nearby drivers.
(a) Explain what is meant by abstraction, using the ride-hailing app to illustrate your answer. [3]
(b) The app represents each driver internally. Identify three attributes that the matching problem needs and two real-world details about a driver that should be abstracted away, justifying your choices. [4]
(c) Explain how information hiding would benefit the team building this app. [2]
| Mark | AO | Awarded for |
|---|---|---|
| 1 | AO1 | States abstraction = removing detail not relevant to the problem |
| 2 | AO1 | States that the relevant/essential detail is kept |
| 3 | AO2 | Applies this to the app (e.g. keep location, ignore the car's paint colour) |
| 4 | AO2 | Names a needed attribute (e.g. current GPS location) |
| 5 | AO2 | Names a second needed attribute (e.g. availability/online status) |
| 6 | AO2 | Names a third needed attribute (e.g. rating) with brief justification |
| 7 | AO2 | Names two irrelevant details (e.g. eye colour, home address) and says why they are irrelevant to matching |
| 8 | AO1 | States information hiding = exposing an interface while concealing implementation |
| 9 | AO3 | Explains a team benefit: parallel work / safe internal change / fewer bugs |
AO split: AO1 = 3, AO2 = 4, AO3 = 2.
Mid-band response (4/9):
(a) Abstraction means removing detail you do not need so the problem is simpler. In the app you do not need everything about a driver.
(b) The app needs the driver's location, name and car. It does not need their eye colour or favourite food.
(c) Information hiding means hiding the code. It helps because other people cannot see it.
Examiner-style commentary: The next-band move is precision and justification throughout. (a) earns marks 1 and 2 for "remove detail you don't need" and the implication that needed detail stays, but mark 3 is lost — there is no concrete kept vs discarded example tied to matching. In (b), "location" earns mark 4 and "name" is borderline (it is needed for display, not matching); "car" is too vague and earns nothing, so marks 5-6 are lost; the two irrelevant details earn mark 7, but with no justification of why they are irrelevant to matching the justification element is weak. (c) misdescribes information hiding as "hiding the code" for secrecy and gives no software-engineering benefit, so marks 8-9 are lost. To climb, name attributes the matching algorithm actually consults and justify each.
Top-band response (9/9):
(a) Abstraction is the deliberate removal of detail that is not relevant to the problem being solved, while keeping the detail the solution depends on. For the ride-hailing app, the matching problem is "find the nearest available driver". So the model keeps each driver's current location and availability, and abstracts away details such as the colour the car is painted — that detail never affects which driver is nearest, so including it would only add complexity.
(b) The matching algorithm needs: (1) the driver's current GPS location, because matching is fundamentally a distance calculation; (2) the driver's availability/online status, because an offline or already-occupied driver must be excluded; and (3) the driver's average rating, because the app may prefer higher-rated drivers when several are equally near. Two details that should be abstracted away are the driver's eye colour and home address — neither is consulted by the matching logic, so both are irrelevant to this problem and would only clutter the model. (Note that home address might be relevant to a different problem, such as tax reporting — relevance is always relative to the problem.)
(c) Information hiding means each component (e.g. the matching engine, the payment engine) exposes only a defined interface and conceals its internal data and code. This benefits the team in two ways: different developers can build the matching and payment components in parallel once they agree on the interfaces, and the internals of one component can be rewritten — say, swapping the matching algorithm for a faster one — without breaking any other component, because nothing outside ever depended on the hidden detail.
Examiner-style commentary: Full marks. The discriminators are (1) tying the definition to a stated problem ("find the nearest available driver") rather than a generic phrase; (2) justifying every chosen attribute by reference to what the algorithm consults, and explicitly noting that relevance is problem-relative; and (3) giving two distinct software-engineering benefits of information hiding (parallel work and safe internal change) rather than the secrecy misconception. Top-band answers always anchor abstraction to a specific purpose.