Analysing Spoken Language

The analysis of spoken language is a distinct and important skill, and one that students often find harder than written analysis precisely because speech behaves so differently from the polished prose they are used to studying. Spoken language is typically produced in real time, without the opportunity to plan, draft, or edit; it is interactional, shaped moment by moment by the responses of other participants; and it is embedded in an immediate social context that gives meaning to its deictic references, its silences, and its overlaps. Features that would count as errors in writing — hesitations, false starts, incomplete sentences — are entirely normal in speech and carry their own interactional meaning.

For AQA Paper 1, you may meet spoken data as a transcript, and the Section A comparison can pair a spoken (or spoken-like) text with a written one, inviting comment on the differences mode makes. This lesson covers the conventions for reading transcripts, the features of spontaneous speech and what they do, the frameworks of Conversation Analysis, the role of power and context, and a systematic method for turning a transcript into a developed, evidence-based analysis. Throughout, the governing principle is the one articulated by linguists such as Michael McCarthy: spoken language is a system in its own right, not a deficient version of writing, and the analyst's job is to describe what speech achieves, not what it lacks.

Transcription Conventions

Spoken language data is usually presented in the form of a transcript — a written representation of spoken language. Transcripts use special conventions to capture features of speech that have no equivalent in standard written English.

The most widely used system is based on the conventions developed by Gail Jefferson (used in Conversation Analysis):

Symbol	Meaning	Example
(.)	Micropause (less than a second)	"I think (.) yeah (.) probably"
(2.0)	Timed pause in seconds	"well (2.0) I suppose so"
[text]	Overlapping speech	A: "I was going to [the shop]" B: "[yeah I know]"
=	Latching (no gap between speakers)	A: "I agree=" B: "=me too"
CAPS	Louder speech	"I said NO"
°text°	Quieter speech	"I'm not sure °about that°"
>text<	Faster speech	">I couldn't believe it<"
	Slower speech	""
te::xt	Sound stretching/elongation	"we::ll I don't know"
↑ ↓	Rising/falling intonation	"really↑" (surprise/question)
-	Self-interruption/cut-off	"I was going to the sh- the supermarket"
(( ))	Non-verbal information	"((laughs))" or "((gestures))"
bold/underline	Emphasis/stress	"I never said that"

Key Definition: Transcript — a written record of spoken language, using special conventions to represent features such as pauses, overlaps, intonation, volume, and pace that cannot be captured by standard written punctuation.

It is important to remember that a transcript is a representation of speech, not the speech itself. Important features such as body language, facial expression, physical context, and full prosodic detail are inevitably lost in transcription. The very act of transcribing involves selection — which features to capture, how finely to time the pauses, whether to note laughter or gesture — so two transcripts of the same recording can differ. This matters for analysis because the conventions a transcript uses tell you what its compiler judged significant; where pauses are timed to a tenth of a second and overlaps precisely bracketed, the data is inviting close conversation-analytic attention to the mechanics of interaction.

The conventions also let you read prosody — the music of speech — from the page. Capitalisation marks increased volume, the upward and downward arrows mark rising and falling intonation, colons mark sound elongation, and the degree symbols mark quietness. Prosodic features frequently carry the most important pragmatic meaning in a transcript: a rising intonation can turn a statement into a question or signal uncertainty; emphatic stress on a particular word can create contrastive meaning ("I never said you took it" implies someone else did); a markedly quiet stretch can signal confidentiality, reluctance, or emotion. When a transcript notates prosody, you should treat those marks as analytically loaded evidence rather than ignoring them in favour of the words alone.

Features of Spontaneous Speech

Spontaneous speech is characterised by a range of features that reflect its real-time production:

Feature	Description	Function
Fillers	Non-lexical vocalisations (um, er, uh) and lexical fillers (like, you know, I mean)	Holding the floor while planning speech; signalling thinking time
False starts	Beginning an utterance and then restarting	"I was going — I went to the — yeah, I went to town"
Self-repairs	Correcting an error mid-utterance	"She lives in Manchester — sorry, Birmingham"
Repetition	Repeating words or phrases	"It was really, really, really good"
Hesitation	Pausing within or between utterances	"I think (.) maybe (.) we should go"
Incomplete utterances	Trailing off without completing a thought	"Well, if you think about it..."
Vague language	Imprecise expressions	"sort of," "kind of," "thing," "stuff," "whatever," "and everything"
Contractions	Shortened word forms	"don't," "can't," "we're," "it's," "gonna," "wanna"
Elision	Dropping sounds in connected speech	"probably" → "probly"; "because" → "cos"
Non-standard grammar	Grammatical forms that differ from Standard English	"We was going," "I done it," "She ain't coming"
Discourse markers	Words that structure the discourse	"right," "so," "well," "anyway," "basically"

These features are not errors — they are natural characteristics of real-time language production. The linguist Michael McCarthy (1998) emphasised that spoken grammar is a system in its own right, not a deficient version of written grammar.

Key Definition: Non-fluency features — characteristics of spontaneous speech that reflect real-time language production, including fillers, false starts, self-repairs, hesitations, and repetitions. These are natural features of spoken language, not errors.

The single most important analytical principle here is to read non-fluency features functionally, not as failings. A filler such as "um" or "you know" is not evidence that a speaker is inarticulate; it typically holds the floor, signalling "I have not finished, I am still planning" and thereby fending off a potential turn-grab at a transition relevance place. A pause may mark genuine hesitation, but it may equally signal that a speaker is choosing words carefully on a sensitive topic, or be a deliberate rhetorical beat in more planned speech. A self-repair ("she lives in Manchester — sorry, Birmingham") demonstrates real-time self-monitoring — the speaker noticing and correcting their own output as they go, which is itself a sophisticated cognitive achievement. Vague language ("thing", "stuff", "sort of") is not always laziness; it can build informality and solidarity, signalling that speaker and listener share enough common ground that precision is unnecessary. Whenever you meet a non-fluency feature in a transcript, the question to ask is never "what is wrong here?" but "what is this doing in the interaction?" — and answering that question is what distinguishes higher-band spoken analysis from the lower-band habit of listing non-fluency features as though tallying mistakes.

It is also worth distinguishing non-fluency features, which are involuntary or planning-driven, from normal non-fluency, the expected baseline level of hesitation that all spontaneous speech contains. A transcript showing a sudden increase in hesitation around a particular topic is more analytically interesting than one showing a steady background level, because the spike may indicate that the topic is difficult, embarrassing, or emotionally loaded for the speaker. Reading the distribution of non-fluency, not just its presence, is a subtle and rewarding move.

Planned vs. Unplanned Speech

Spoken language exists on a continuum from highly planned to completely spontaneous:

More planned ←	→ More spontaneous
Scripted speech (newsreader, actor)	Casual conversation
Prepared speech (lecture, sermon)	Argument or heated discussion
Semi-prepared (job interview, oral exam)	Phone call with friend
Rehearsed (presentation, wedding speech)	Storytelling among friends

The degree of planning affects the language features a spoken text is likely to display:

Feature	Planned Speech	Spontaneous Speech
Fluency	Generally fluent; few hesitations or repairs	Frequent hesitations, fillers, false starts, repairs
Vocabulary	Formal, precise, may be pre-selected	Informal, vague, colloquial
Grammar	Complex, complete sentences; closer to written norms	Incomplete sentences, non-standard forms, loose coordination
Structure	Clear, logical organisation	Topics shift, digress, and overlap
Prosody	Controlled intonation, pace, and stress	Variable; influenced by emotion and real-time interaction

Context and Power in Spoken Interaction

Spoken interactions are profoundly shaped by the context in which they occur and the power relations between participants.

Power in Conversation

Power can be signalled and enacted through language in several ways:

Power Strategy	Description	Example
Interruption	Taking the floor before the current speaker has finished	A more powerful speaker may interrupt more frequently
Topic control	Initiating, changing, or closing topics	A chair of a meeting controls the agenda and topics
Question-answer patterns	The person asking questions holds the power (e.g., interviewer, teacher, police officer)	"Where were you on the night of the 15th?"
Directives	Using commands and instructions	"Sit down." "Open your books to page 42."
Terms of address	Using first names, titles, or other forms of address	A teacher may use first names for students while being addressed as "Miss" or "Sir"
Amount of talk	Dominant speakers tend to speak more and hold the floor longer	In institutional settings, the person with authority often speaks most

The linguist Norman Fairclough (1989, Language and Power) argued that power relations are both reflected in and constructed through language use. Analysing the linguistic features of spoken interactions can reveal hidden power dynamics.

Key Definition: Asymmetrical power — a relationship between participants in which one person holds more social, institutional, or interactional power than the other. This is often reflected in and reinforced by language choices.

Fairclough draws a useful distinction between instrumental power and influential power. Instrumental power is the authority held by institutions and individuals who can impose obligations and sanctions — a police officer, a teacher, an employer; it is enforced openly. Influential power, by contrast, works through persuasion and the shaping of attitudes — the power of advertising, media, and political rhetoric, which seeks consent rather than compliance. In a transcript, instrumental power often shows up in directives, constraints on what others may say, and the right to evaluate others' contributions; influential power shows up in the subtler machinery of persuasion. Identifying which kind of power is in play, and the linguistic means by which it is exercised, gives a transcript analysis real analytical bite.

A further refinement concerns the difference between power asymmetry that is institutionally given and power that is interactionally achieved. A job interviewer holds power partly because of the institutional role (given), but they also enact it turn by turn — by asking all the questions, controlling the topics, deciding when an answer is sufficient (achieved). Sometimes a participant with less institutional power can nonetheless seize interactional power for a moment — interrupting, refusing to answer, controlling a topic — and noticing these local shifts is more sophisticated than assuming power is fixed and one-directional. The most penetrating spoken analyses track how power is negotiated across the interaction, not merely asserted once.

Interactional Language

Much spoken language serves an interactional (social) function rather than a transactional (information-exchanging) function.

Phatic Communion

The anthropologist Bronislaw Malinowski (1923) coined the term phatic communion to describe language used to establish and maintain social bonds rather than to convey information.

Key Definition: Phatic communion (Malinowski, 1923) — the use of language for social bonding rather than information exchange. Small talk, greetings, and conversational rituals that maintain social relationships are examples of phatic communion.

Examples of phatic language:

"How are you?" "Fine, thanks, and you?" (neither party expects a detailed medical report)
"Lovely weather, isn't it?" (the purpose is not meteorological information but social contact)
"See you later" (may not be literally intended)

The analytical value of recognising phatic communion is that it stops you from misreading socially-motivated talk as if it were information-bearing. The ritual exchange of greetings, weather remarks, and pleasantries at the start of an interaction is doing relational work — establishing contact, signalling goodwill, and easing the participants into the encounter before any transactional business begins. A transcript that opens with extended phatic talk before the speakers reach their actual purpose reveals something about the relationship and the social conventions in play; a transcript that dispenses with phatic preliminaries and goes straight to business may signal urgency, a power asymmetry, or a purely transactional relationship. Tracking the balance between interactional (relationship-building) and transactional (task-focused) language across a conversation, and noting where the speakers move between the two, is a strong whole-transcript observation that links the mechanics of talk to its social purpose.

Analysing Spoken Language

Analysing Spoken Language

Transcription Conventions

Features of Spontaneous Speech

Planned vs. Unplanned Speech

Context and Power in Spoken Interaction

Power in Conversation

Interactional Language

Phatic Communion

Backchannel Behaviour

More in English Language