Testing Strategies

This lesson covers software testing — the systematic process of executing a program with the intention of finding defects and demonstrating that it meets its requirements. Testing is not an afterthought bolted on at the end; it is a discipline woven through the whole development lifecycle, and it is a guaranteed source of exam marks because it combines factual knowledge (types of testing, categories of test data) with the practical skill of designing a test plan for a given scenario. A famous observation by Edsger Dijkstra frames the subject well: testing can show the presence of bugs, but never their absence — so the goal is to choose test data shrewdly enough to expose the bugs that matter.

Spec Mapping

This lesson addresses testing within the Systematic approach to problem solving section of the AQA A-Level Computer Science (7517) specification (subject content area 4.13), and supports the testing of robust programs in 4.1.3. It covers: the distinction between black-box and white-box testing; the testing levels of unit, integration, system and acceptance testing, including alpha and beta testing; iterative testing during development versus final testing; and the design of test data in the three categories normal, boundary and erroneous, organised into a test plan. You are expected to select appropriate test data for a given algorithm, construct a test plan, and explain and justify the choice of testing strategy.

Why test, and verification versus validation

Testing exists to find defects before users do, to demonstrate that the system meets its requirements, to give confidence that it is reliable and secure, and — crucially — to reduce cost, because a defect caught during development is far cheaper to fix than one discovered in production. Two related but distinct goals run through all testing:

Term	Question it answers	Concerned with
Verification	"Are we building the product right?"	Conformance to the specification and design
Validation	"Are we building the right product?"	Whether the system genuinely meets the user's needs

A system can pass verification (it matches the specification perfectly) yet fail validation (the specification described the wrong thing). Good testing pursues both.

Black-box and white-box testing

These two terms describe how much the tester knows about the internal code, and they are among the most frequently examined ideas in the topic.

Feature	Black-box testing	White-box testing
View of the code	Internal code is hidden	Internal code is visible and analysed
Tests derived from	Requirements / specification	Code structure, logic and paths
Focus	What the system does (input → output)	How the system works (branches, conditions)
Typical techniques	Equivalence partitioning, boundary-value analysis	Statement, branch and path coverage
Usually performed by	Independent testers or end users	The developers themselves

In black-box testing you treat the program as an opaque box: you feed in inputs and check the outputs against what the specification says should happen, without caring how the answer is computed. Equivalence partitioning divides the inputs into groups that should be treated identically (e.g. "valid ages", "too-low ages", "non-numeric input") so that one representative value tests each whole group.

In white-box (also called structural or glass-box) testing you use knowledge of the code to design tests that exercise its internal structure:

Statement coverage — every line of code runs at least once.
Branch (decision) coverage — every branch of every decision (the true and the false side) is taken at least once.
Path coverage — every distinct route through the code is executed (often impractical, as the number of paths explodes with loops and nested conditions).

def grade(score: int) -> str:
    if score >= 70:
        return "Distinction"
    elif score >= 50:
        return "Pass"
    else:
        return "Fail"

# White-box tests chosen to achieve full BRANCH coverage:
assert grade(75) == "Distinction"   # first condition True
assert grade(60) == "Pass"          # first False, second True
assert grade(30) == "Fail"          # both False -> else branch

Each of the three tests above forces a different branch of the logic, so together they give full branch coverage — something black-box testing could only achieve by luck, because a black-box tester cannot see where the decision boundaries lie.

Levels of testing

Testing happens at progressively wider scopes, conventionally in the order unit → integration → system → acceptance.

flowchart TB
    U["Unit testing
single components in isolation"] --> I["Integration testing
components working together"]
    I --> S["System testing
the whole integrated system vs requirements"]
    S --> A["Acceptance testing
end users confirm fitness for purpose"]

Unit testing

Unit testing checks a single component — a function, method or class — in isolation. Each test supplies an input and asserts the expected output.

def add(a: int, b: int) -> int:
    return a + b

assert add(2, 3) == 5
assert add(-1, 1) == 0
assert add(0, 0) == 0

Strengths	Limitations
Fast; easy to automate and re-run	Tests components in isolation only
Pinpoints the faulty unit precisely	Cannot catch faults in how units interact

Integration testing

Integration testing checks that units already tested individually work correctly together.

Approach	How it works
Top-down	Start with high-level modules; replace not-yet-built lower modules with stubs.
Bottom-up	Start with low-level modules; use drivers to simulate the callers above them.
Big-bang	Combine everything at once — risky, because a failure is hard to localise.

System testing

System testing checks the complete, integrated system against the requirements specification, in an environment that mirrors production. It covers both functional requirements (does it do what it should?) and non-functional ones (performance, security, usability).

Acceptance testing

Acceptance testing (or user acceptance testing, UAT) is carried out from the user's perspective to confirm the system is fit for purpose and ready to deploy.

Type	Performed by	When
Alpha testing	Internal staff within the developer's organisation	Before release, in-house
Beta testing	A limited group of real external users	Just before public launch

Exam Tip: Remember both the order (unit → integration → system → acceptance) and that the scope widens at each level. A common question is to explain the difference between alpha (internal) and beta (external) testing — keep that distinction crisp.

Iterative testing versus final testing

In a methodology such as agile or XP, testing is iterative — carried out continuously, with each small increment tested as it is built, so defects surface immediately. In a waterfall project, the bulk of formal testing is a single final phase after implementation is complete. Iterative testing finds faults earlier and cheaper but requires automation to be sustainable; final testing is simpler to manage but discovers expensive late-stage defects, linking this topic directly to the methodologies lesson.

Test data: normal, boundary and erroneous

Choosing test data well is the heart of a good test plan. AQA recognises three categories, and a thorough plan includes data from each.

Category	Meaning	Example for "age must be 0–120"
Normal (valid)	Typical data inside the accepted range	25, 50, 100
Boundary (extreme)	Values exactly on the edge of the valid range	0, 120 (and just outside: −1, 121)
Erroneous (invalid)	Wrong type or clearly outside the range	"abc", −5, 999, empty input

Boundary values deserve special attention because off-by-one errors cluster at the edges of ranges — a condition written < 120 instead of <= 120 fails precisely at 120. For a valid range of 1–100, a careful tester checks the values straddling each boundary:

Test value	Category	Expected result
0	Erroneous (just below minimum)	Rejected
1	Boundary (minimum)	Accepted
2	Normal (just above minimum)	Accepted
50	Normal	Accepted
99	Normal (just below maximum)	Accepted
100	Boundary (maximum)	Accepted
101	Erroneous (just above maximum)	Rejected

Exam Tip: When asked for test data, always include boundary values — both the boundary itself and the value immediately outside it — and clearly label each value's category and expected outcome. Mark schemes routinely allocate specific marks for boundary and erroneous data, so an answer giving only "normal" values caps itself low.

Constructing a test plan

A test plan is a structured table prepared before testing, recording for each test: the data used, its category, the reason for choosing it, the expected result and (after running) the actual result. The discipline of writing it down forces systematic coverage and gives evidence that testing was thorough — exactly what the NEA assessment criteria reward.

Consider a function validate_pin(pin) that should accept a string of exactly four digits and reject everything else.

Test no.	Test data	Category	Reason	Expected result
1	"1234"	Normal	A typical valid PIN	Accepted
2	"0000"	Normal	All-zero valid PIN	Accepted
3	"999"	Boundary	One digit too short (3)	Rejected
4	"12345"	Boundary	One digit too long (5)	Rejected
5	"12a4"	Erroneous	Contains a non-digit	Rejected
6	""	Erroneous	Empty input	Rejected
7	" "	Erroneous	Whitespace only	Rejected

Tests 3 and 4 sit immediately outside the valid length of four, catching the most likely off-by-one mistake in the length check; tests 5–7 probe the "digits only" rule with different kinds of invalid content. This single table demonstrates normal, boundary and erroneous coverage — the structure examiners look for.

Regression testing

Regression testing means re-running an existing suite of tests after a change to confirm that the change has not broken anything that previously worked. It guards against the common situation where fixing one bug, adding a feature, or refactoring silently breaks unrelated functionality. Because the same tests are run repeatedly, regression testing is almost always automated and is a cornerstone of continuous integration, tying back to the XP practices in the methodologies lesson.

Test-driven development (TDD)

TDD inverts the usual order by writing the test before the code. Its cycle is red → green → refactor:

Red — write a test for the desired behaviour; it fails because the code does not exist yet.
Green — write the minimum code needed to make the test pass.
Refactor — tidy the code while keeping every test green.

# Red: a failing test describing what we want
def test_is_palindrome():
    assert is_palindrome("racecar") is True
    assert is_palindrome("hello") is False
    assert is_palindrome("") is True

# Green: the minimum implementation that passes
def is_palindrome(text: str) -> bool:
    return text == text[::-1]
# Refactor: already clean, so nothing to do

TDD guarantees test coverage from the outset, encourages small focused units, and leaves the tests behind as living documentation of intended behaviour.

Debugging when tests fail

A failing test tells you that something is wrong; debugging finds why.

Technique	What it does
Trace tables	A paper technique: hand-trace the code, recording each variable's value step by step.
Print/log statements	Output variable values at key points to see what the program actually does.
Breakpoints	Pause execution at a chosen line using a debugger.
Single-stepping	Execute one line at a time, watching the state change.
Watch variables	Monitor specific variables continuously as the program runs.
Call-stack inspection	Examine the chain of function calls that led to the current point.

The trace table is especially exam-relevant: "complete the trace table for this algorithm" questions are common, and the same skill underpins both predicting a program's behaviour and locating a logic error.

Equivalence partitioning in depth

Equivalence partitioning is the black-box technique that makes test selection systematic rather than arbitrary. The idea is that the set of all possible inputs divides into partitions (classes) within which every value should be treated identically by the program. If "27" is processed correctly, every other valid age in the same partition almost certainly is too — so testing one representative from each partition gives broad coverage with very few tests, and the partition boundaries tell you exactly where the boundary values lie.

For a function accepting an age in the range 18–64 inclusive, the partitions are:

Partition	Range	Representative (normal)	Boundary values
Below valid	age < 18	10	17 (just below)
Valid	18 ≤ age ≤ 64	40	18 and 64 (the edges)
Above valid	age > 64	80	65 (just above)
Invalid type	not a number	"abc"	— (separate class)

Testing Strategies

Testing Strategies

Spec Mapping

Why test, and verification versus validation

Black-box and white-box testing

Levels of testing

Unit testing

Integration testing

System testing

Acceptance testing

Iterative testing versus final testing

Test data: normal, boundary and erroneous

Constructing a test plan

Regression testing

Test-driven development (TDD)

Debugging when tests fail

Equivalence partitioning in depth

More in Computer Science