You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
The ability to transcribe speech and to analyse spoken data phonologically is a core skill for AQA A-Level English Language (7702). Whether you are examining accent variation, connected-speech processes or prosody, you need to work with transcription and spoken data systematically and confidently. This is not a stand-alone topic: phonetics is a method integrated across every component — Paper 1, Paper 2 and the NEA — and transcription is the apparatus that turns an impression about speech into evidence an examiner can credit. Doing it well serves AO1 (systematic method and terminology) and, when you connect features to context, AO3 (the situational and social factors shaping production and reception). The full assessment-objective profile for the A-Level is AO1 26 · AO2 26 · AO3 23 · AO4 15 · AO5 10. This lesson sets out the levels of transcription, the conventions you will meet, and — most importantly — a reliable method for analysing spoken data under exam conditions.
A point to internalise before anything else: a transcription is a selective representation, not a neutral recording. Every transcript embodies analytical choices about what to capture and what to ignore, and even a careful transcriber is making perceptual judgements about gradient, continuous speech. Treating transcription as if it were a perfect, objective record is a conceptual error; treating it as a tool deployed in the service of a specific analytical point is the mark of a strong candidate.
There are several levels of transcription, each capturing different information and serving a different purpose. Choosing the right level for the point you are making is itself an analytical skill.
An orthographic transcription represents speech in standard spelling. It captures what was said but nothing about how:
"I was going to go to the shop but I couldn't be bothered"
It is the starting point for most data work and is adequate for analysing lexis, grammar and discourse — but it is silent about pronunciation.
A phonemic transcription uses IPA between forward slashes / / to record the contrastive sound segments — the phonemes — of an utterance:
/aɪ wɒz ˈɡəʊɪŋ tə ɡəʊ tə ðə ʃɒp bʌt aɪ ˈkʊdnt bi ˈbɒðəd/
This captures which phonemes are used (note the reduced weak forms "to" → /tə/ and "the" → /ðə/) but not fine phonetic detail. Broad transcription is the standard expectation for most A-Level analysis.
A phonetic transcription uses IPA in square brackets [ ] to capture fine articulatory detail, including allophonic variation, often with diacritics:
[aɪ wɒz ˈɡəʊɪŋ tə ɡəʊ tə ðə ʃɒp bʌʔ aɪ ˈkʰʊdn̩ʔ bi ˈbɒðəd]
Here the narrow level records t-glottalling (bʌʔ, kʊdn̩ʔ), aspiration of the initial voiceless plosive (kʰ), the syllabic nasal (n̩) and so on — detail the broad transcription deliberately omits. You switch to narrow transcription precisely when an allophonic feature is the point you want to make.
A prosodic transcription layers in suprasegmental information — stress, intonation, pausing — usually over an orthographic base:
"I was GOing to go to the SHOP (.) but I COULDn't be BOTHered" (with a falling tone at the close)
The level you choose should match your purpose: orthographic to anchor what was said, phonemic for the segmental choices, narrow for allophonic realisation, prosodic for stress and intonation. Strong answers move fluidly between levels, deploying narrow transcription only where allophonic detail earns its keep.
Spoken-data transcripts in the AQA exam use a recognisable set of conventions (close to the Jefferson system widely used in conversation analysis). You should be able to read these fluently and comment on what they reveal:
| Symbol | Meaning |
|---|---|
| (.) | Micropause (under roughly 0.2 seconds) |
| (1.0), (2.0)… | Timed pause in seconds |
| Upward arrow | Rising intonation |
| Downward arrow | Falling intonation |
| CAPITALS | Stressed / emphasised syllable or word |
| Underlining | Emphasis (in some systems) |
| Square brackets | Overlapping speech (two speakers at once) |
| = | Latching (one turn begins the instant the previous ends, no gap) |
| :: | Sound lengthening (more colons = longer) |
| - | Word cut off / false start |
| (( )) | Non-verbal sounds or contextual notes, e.g. ((laughs)), ((door slams)) |
| hhh | Audible breathing / aspiration |
Key Definition: Transcription conventions — the standardised notation used to represent features of speech such as pauses, stress, intonation, overlap, latching, lengthening and non-verbal sounds. The Jefferson system is the most widely used; different systems vary slightly, so read each transcript's key carefully.
A vital interpretive caution: these conventions describe behaviour, not motive. A pause notated "(1.0)" is an observable fact; whether it signals hesitation, planning, emphasis or turn-yielding is your interpretation, to be argued from the surrounding context — not read off the symbol. The same is true of overlap (which may be supportive or competitive) and of fast tempo (which may signal excitement or anxiety). Confusing the notation with its meaning is a common and avoidable slip.
When you are given spoken data and asked to analyse its phonological features, a disciplined five-step routine keeps your answer evidenced and stops it degenerating into an undifferentiated feature-list.
Before any feature-spotting, fix the situational frame, because every feature gains its meaning from context:
Scan for the consonant and vowel features that locate the speaker regionally and socially:
| Category | What to look for |
|---|---|
| Vowel features | TRAP–BATH split, FOOT–STRUT split, GOAT/FACE/PRICE realisations, vowel mergers (cot–caught, cure–force) |
| Consonant features | Rhoticity, h-dropping, t-glottalling, TH-fronting, (ng), yod features, L-vocalisation |
| Overall pattern | Do the features cohere into a recognisable regional accent or sociolect, or a particular bundle (e.g. a south-eastern set)? |
Distinguish accent-system features (Step 2) from the processes of fluent speech:
[tem]).[neks deɪ])./r/ in non-rhotic speech./tə/, "and" → /ən/)./ə/ in unstressed syllables.If prosodic notation is present, comment on stress (which words are foregrounded and why), intonation (rising, falling, fall–rise and their functions), pausing (where and what it might suggest), tempo and volume — always arguing the function from context rather than assuming it.
Do not merely list: link each feature to effect and meaning. This is where the marks are.
| Feature | Possible interpretation (to be argued, not assumed) |
|---|---|
| t-glottalling, elision, weak forms | Informal, casual register |
| Careful articulation, full forms | Formal, monitored register |
| Regional accent features | Geographical / social background |
| Shift in feature-rate across the data | Accommodation (convergence/divergence) or style-shifting |
| Emphatic stress on key words | Speaker foregrounding important information |
| Frequent pausing | Hesitation, planning, emphasis or turn-yielding (context decides) |
A frequent muddle in weaker answers is treating everything a speaker does to a sound as an "accent feature". In fact two quite different things are going on, and distinguishing them is a high-level analytical move. Accent-system features are properties of the speaker's underlying phonological system — which vowel they have in the BATH set, whether they are rhotic, whether they have the FOOT–STRUT split. These are stable across the speaker's speech and locate them regionally and socially. Connected-speech processes — assimilation, elision, liaison, weak forms, vowel reduction — are not properties of one accent but general features of fluent, casual English that any speaker applies more in relaxed than in careful speech. They are conditioned by speed and formality, not by region.
| Type | What it tells you | Example | Notation |
|---|---|---|---|
| Accent-system feature | The speaker's regional/social identity | Rhoticity; BATH = /æ/; unsplit FOOT–STRUT | Often broad / / (a phonemic choice) |
| Connected-speech process | The register/formality of this utterance | "next day" → [neks deɪ] (elision); "ten boys" → [tem] (assimilation) | Narrow [ ] (a contextual realisation) |
The consequence for analysis is important. When you spot the BATH vowel as /æ/, you are licensed to infer northern accent (an identity claim). When you spot elision in "next day", you are not licensed to infer anything about region — only about register (casual, fast speech). Treating an elision as if it were a regional marker is a category error; so is treating a stable accent feature as if it were merely casual reduction. A strong answer says, in effect: "these features (rhoticity, BATH vowel) tell me where the speaker is from; these features (elision, weak forms, glottalling) tell me how formal this particular stretch of talk is." That double reading — identity and register — is exactly the kind of disciplined categorisation examiners reward, and it depends on keeping the two types of feature apart.
One feature usefully complicates the neat split: t-glottalling is partly both. It is spreading as a general, register-sensitive process (everyone glottals more in casual speech), yet its overall frequency and the positions in which it is permitted still vary by accent and social group. The sophisticated move is to acknowledge this — to note that glottalling indexes both informality and a broadly south-eastern, younger profile — rather than forcing it into one box.
For accent specifically, a clean three-part structure ensures each point is fully developed.
1. Identify the feature with precise metalanguage and, where useful, the lexical-set keyword: "the speaker uses t-glottalling intervocalically"; "the speaker realises the BATH vowel as /æ/ rather than /ɑː/"; "the speaker exhibits h-dropping in the stressed word 'house'".
2. Evidence it from the transcript with IPA, quoting the token and giving the realisation: "in 'butter' the /t/ is realised as a glottal stop [ʔ] rather than [t]: [ˈbʌʔə]". Crucially, show that the feature recurs across the data — pattern, not a single token — to demonstrate you are describing the speaker's system.
3. Evaluate its significance: regionally ("t-glottalling is widespread across urban British English, especially the south-east"), socially ("h-dropping is perceived as a working-class feature, though it is a systematic and historically established one") and stylistically ("the speaker's full forms suggest a careful, formal register"). Keep the linguistic description separate from any report of social attitudes.
When a transcript carries prosodic notation, the suprasegmental features — those that ride above individual segments — are often where the richest meaning lies, and ignoring them is a common way to leave marks on the table. Four are worth systematic attention.
Stress is the relative prominence of syllables. Within a word, lexical stress is fixed and can be contrastive ("a REcord" the noun versus "to reCORD" the verb); across an utterance, contrastive or emphatic stress foregrounds particular words for effect — "I NEVER said that" places the weight on the denial. When CAPITALS or underlining mark stress in a transcript, ask why that word: emphatic stress typically signals contrast, correction or the speaker's evaluation.
Intonation is the melody of speech — the pattern of pitch movement across an utterance. The broad functions worth knowing are that falling tones tend to signal completion, certainty and straightforward statements or commands, while rising tones tend to signal incompleteness, questioning or appeal for a response; a fall–rise often carries implication or reservation ("well, it's fine…"). A revealing case is rising intonation on a grammatical statement: "she's coming tomorrow" said with a rise can function as a request for confirmation — an indirect speech act — and may also reflect "uptalk", a much-discussed feature of some younger speakers' speech. The disciplined move is always to argue the function of a tone from the surrounding context, not to assume a fixed meaning.
Tempo and pausing structure the flow of talk. Pauses — micropauses "(.)" and timed pauses "(1.0)" — are observable facts; their meaning (hesitation, planning, emphasis, turn-yielding, dramatic effect) is an interpretation to argue from context. Changes of tempo (speeding up, slowing down) and of volume likewise carry meaning that depends on situation: a slow, quiet delivery might signal seriousness or uncertainty, a fast loud one excitement or anxiety.
Rhythm in English is stress-timed: stressed syllables fall at roughly regular intervals and unstressed syllables are compressed between them, driving the vowel reduction and weak forms discussed above. A transcript in which a speaker fails to reduce — using full, evenly-weighted syllables — marks unusually careful or citation-style delivery, itself a meaningful finding about register.
The overarching discipline, repeated because it is so often breached, is to distinguish the behaviour the notation records from the motive you infer. A downward arrow records a falling pitch; "the speaker sounds final and certain" is your interpretation. Keeping the two apart — and signposting interpretations as interpretations — is exactly the methodological self-awareness that lifts an answer into the top band.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.