Investigation Methodology

The methodology section of your language investigation explains how you collected your data and, crucially, why you collected it that way. It is short — typically only 300–400 words of your ~2,000-word report — but it does heavy lifting. A strong methodology reassures the moderator that your data collection was systematic, ethical and fit for your research question, rather than haphazard. It also sets up the evaluation you will return to in your conclusion, because every methodological choice carries limitations you will later weigh honestly. This lesson covers the main data-collection methods, the distinction between quantitative and qualitative approaches, sampling, the central importance of research ethics, the observer's paradox, and how to write the section up.

Quantitative, Qualitative and Mixed Methods

Before choosing a method, decide on your overall approach, because it shapes everything else.

Key Definition: Quantitative data — numerical data produced by counting and measuring linguistic features (e.g. frequency of hedges per 100 words). Qualitative data — interpretive data about how language functions in context (e.g. how a particular hedge softens a face-threatening act). Most strong investigations are mixed-methods: they count patterns and interpret what those patterns mean.

Quantitative analysis answers "how much / how often" and lets you present clear comparisons in tables; qualitative analysis answers "how and why," and is where the deeper AO3 marks live. Counting that "female speakers used 7.8 hedges per 100 words to males' 4.2" is quantitative; explaining that those hedges largely served a relational rather than an uncertainty function is qualitative. A mixed approach — count, then interpret — almost always outperforms either alone, because the numbers locate the pattern and the interpretation explains its significance.

Data Collection Methods

Several main methods are available. Your choice should follow from your research question, not the other way round.

1. Recording and transcription

Recording spoken language and transcribing it suits topics involving:

Spoken interaction (conversation, classroom discourse, interviews)
Child language acquisition
Accent and dialect features
Language and power in spoken settings

Key Definition: Transcription — converting spoken language into written form using conventions that represent features such as pauses, overlaps, stress and intonation. The level of detail should match your focus: a lexical study needs only a light transcription; a study of turn-taking or prosody needs a far more detailed one.

A common set of transcription conventions

Symbol	Meaning
(.)	Micropause (under one second)
(2.0)	Timed pause, in seconds
// or [	Onset of overlapping speech
=	Latching (no audible gap between turns)
CAPITALS	Increased volume or emphasis
::	Elongated sound (e.g. "so::")
(( ))	Non-verbal or contextual notes
(xxx)	Unclear or uncertain transcription
↑ ↓	Marked rising or falling intonation

Select only the conventions your investigation needs, and — importantly — apply them consistently, because inconsistent transcription quietly undermines the reliability of every count you later make.

Practical recording tips

A phone is usually adequate, but test it first and record a short trial.
Place the device centrally so every speaker is audible.
Record more than you expect to need; you can select the richest extract later.
Transcribe promptly, while the interaction is fresh, and listen repeatedly for accuracy.

2. Questionnaires and surveys

Questionnaires collect attitudinal data — opinions, beliefs and perceptions about language. They suit accent and dialect attitudes, attitudes to language change, and matched-guise studies (listeners rate the same speaker recorded in two varieties).

Design principles:

Mix closed items (e.g. Likert scales, "strongly agree"–"strongly disagree") with open items that invite explanation.
Avoid leading questions that steer the respondent toward an answer.
Keep it short; concentration fades fast.
Pilot it on a few people to catch ambiguities.
Include demographic items (age, region) only where they are relevant to your analysis.

Question type	Example	Best for
Likert scale	"Rate this speaker 1–5 for competence"	Quantitative attitudinal data
Multiple choice	"Which accent do you associate with authority?"	Categorical data
Open response	"What do you think of regional accents in newsreading?"	Qualitative insight
Semantic differential	"Friendly ◁——▷ Unfriendly"	Attitudes on a continuum

A standard caution to record in your evaluation: respondents may give socially desirable answers — what they think they ought to say about accents rather than what they truly feel — which is exactly why a matched-guise design, eliciting reactions indirectly, can reveal covert attitudes a direct question would miss.

3. Corpus analysis

A corpus is a large, structured collection of texts; corpus methods reveal patterns in frequency and distribution. The approach suits language change over time and comparison across genres.

Key Definition: Corpus — a large, systematically compiled, searchable collection of written or spoken texts. Established examples include the British National Corpus (BNC) and the Corpus of Contemporary American English (COCA).

You may lack access to major research corpora, but you can build a small one yourself — say, 30 newspaper headlines from each of three decades, or 40 product descriptions from one retailer. Free tools such as AntConc generate frequency lists, concordances and collocations; Google Books Ngram Viewer tracks word frequency in published books over time. Always state how you built and sampled the corpus.

4. Existing texts and documents

Some investigations analyse pre-existing, publicly available texts — newspaper articles, advertisements, political speeches, published social-media posts, historical documents, song lyrics or scripts. This sidesteps the ethics of recording people, but limits you to written or scripted language. Even with public texts, anonymise private individuals and respect the platform's context.

Sampling

Whatever the method, you must think about sampling — how you select the specific data you will analyse — and justify it explicitly.

Concept	What it means for you
Representativeness	Your data should be typical of the context you claim to investigate
Sample size	Large enough to support a claim, small enough to analyse in depth within ~2,000 words
Systematic selection	State clearly how and why you chose these data (e.g. "every fifth reply," "first 40 posts in the thread")
Controlling variables	When comparing two sets, hold other variables constant where you can, so differences are attributable to the variable of interest

Coursework Tip: Far more students drown in data than starve. A small, carefully selected dataset analysed thoroughly beats a large one you can only skim. Depth of analysis, not volume of data, is what the band descriptors reward.

Research Ethics

Ethics are not an optional courtesy; they are integral to your methodology, and your teacher must approve your data-collection plan before you begin. Build the following principles into your design and report them explicitly.

Informed consent. Participants must know they are being recorded or studied and must agree in advance. Provide a brief consent form stating the purpose of the study, how the data will be used and stored, and the right to withdraw. For a child, obtain consent from a parent or guardian.
Anonymisation. No participant should be identifiable in your write-up. Use pseudonyms and remove identifying details (names, places, schools, employers).
Right to withdraw. Participants may withdraw at any point, and their data must then be removed.
No harm. The study must not cause distress, embarrassment or harm.
Secure storage. Keep recordings and transcripts secure, and delete them once the investigation is complete unless participants have agreed otherwise.

Key Definition: Informed consent — the principle that participants must be fully informed about the nature, purpose and use of the research, and must voluntarily agree to take part, before any data is collected. Covert recording of private interaction breaches this principle and is not acceptable in an A-level investigation.

The observer's paradox

When you record naturally occurring speech you confront the observer's paradox, identified by William Labov: the aim is to observe how people speak when they are not being observed, yet the very act of observing alters their behaviour. People often shift toward more careful, self-conscious speech when they know a recorder is running.

You cannot eliminate the paradox, but you can mitigate it — and, importantly, you should acknowledge it in your evaluation rather than pretend your data are perfectly natural:

Let participants acclimatise to the device before your target stretch begins.
Record in familiar, comfortable settings.
Use unobtrusive equipment.
Record longer sessions and analyse the later, more relaxed portions.

Reliability and Validity

Two further concepts belong in a strong methodology.

Concept	Definition	How to strengthen it
Reliability	Consistency — would another researcher using your method get similar results?	Use systematic, documented procedures; apply transcription conventions and coding categories consistently; describe the method clearly enough to be replicated
Validity	Whether your method actually measures what it claims to	Ensure the method genuinely fits the research question; use triangulation (combining methods or data sources) where feasible; acknowledge limitations honestly

If you code your data — for example, sorting questions into "open," "closed" and "leading" — define each category precisely and apply it uniformly, because vague or shifting categories destroy reliability and make your counts meaningless.

Matching Method to Topic

Topic type	Recommended method(s)	Rationale
Spoken interaction	Recording + transcription	Captures the detail of talk in real time
Child language acquisition	Recording + transcription (+ field notes)	Needs an accurate record of the child's utterances
Language attitudes	Questionnaire / matched-guise	Elicits perceptions and (covert) attitudes
Language change	Corpus / existing texts	Needs comparable data from different periods
Written genre analysis	Existing texts	The data already exist in written form
Language and technology	Existing texts (captured posts/threads)	Digital language is already written
Language and power	Recording, or existing texts	Depends on whether the context is spoken or written

Working With Transcription in Practice

Because so many investigations rest on transcribed speech, it repays a closer look at how to transcribe well — transcription is not a neutral, clerical step but an analytical one, and the choices you make there shape every claim that follows. Three principles matter most.

First, match the grain of your transcription to your research question. Transcription exists on a spectrum from light to fine. A light (orthographic-plus) transcription captures the words, speaker turns, and a few salient features — major pauses, overlaps, emphasis — and is ample for a study of lexis or turn-taking. A fine transcription adds detailed timings, micropauses, intonation contours and, in extreme cases, phonetic detail in the IPA; you need it only if prosody or pronunciation is your actual object of study. The error to avoid is mismatch in either direction: a lexical study buried under needless phonetic notation wastes effort and obscures the analysis, while a prosody study transcribed only orthographically simply lacks the data to make its case.

Second, transcribe consistently or your counts collapse. If you mark a one-second gap as a micropause in one place and ignore it in another, any later claim about "frequency of pauses before disagreement" is built on sand. Fix your conventions before you start, keep a key, and apply them mechanically. This is the practical face of reliability: another researcher should be able to take your recording and your key and produce a closely similar transcript.

Third, acknowledge that transcription involves interpretation. Deciding whether an utterance is "(.)" or "(0.5)", whether a sound is emphasis or merely volume, or where one turn ends and another begins, all involve judgement. The strongest investigations concede this openly in evaluation — noting, for instance, that borderline cases were resolved by a stated rule, and that a second transcriber might have decided some of them differently. That candour is not weakness; it is exactly the evaluative maturity the descriptors reward.

Coursework Tip: Transcribe a short trial extract early, then show it to your teacher. It is far better to discover that your convention set is too heavy, too light, or internally inconsistent on a two-minute sample than to find it out after transcribing fifteen minutes of speech.

Quantifying Carefully: Counts, Rates and Honest Comparison

Most mixed-methods investigations involve some counting, and counting done carelessly produces conclusions that a moderator will distrust. A few disciplines keep your quantitative work honest.

Normalise before you compare. Raw counts mislead whenever the things being compared are of different sizes. If one speaker talks for 600 words and another for 300, a raw tally of "12 hedges versus 8" tells you little; expressed as a rate — hedges per 100 words — the picture (2.0 versus 2.67) may even reverse. Whenever your two data sets differ in length, convert counts to rates per fixed unit (per 100 words, per minute, per turn) and say so.

Investigation Methodology

Investigation Methodology

Quantitative, Qualitative and Mixed Methods

Data Collection Methods

1. Recording and transcription

A common set of transcription conventions

Practical recording tips

2. Questionnaires and surveys

3. Corpus analysis

4. Existing texts and documents

Sampling

Research Ethics

The observer's paradox

Reliability and Validity

Matching Method to Topic

Working With Transcription in Practice

Quantifying Carefully: Counts, Rates and Honest Comparison

More in English Language