Research Methods: Non-Experimental Methods

Not all research in psychology uses the experimental method. Many important questions cannot be answered through experiments — either because the variables cannot ethically or practically be manipulated, because the behaviour of interest occurs in natural settings, or because the researcher wants depth and meaning rather than a single measured outcome. Non-experimental approaches include observations, self-report methods (questionnaires and interviews), correlations, content and thematic analysis, and case studies. Each has distinctive strengths and limitations, and each must be evaluated for reliability and validity and conducted within the BPS ethical framework. This lesson also addresses sampling and what it means for psychology to be a science.

Key Definition: Non-experimental methods are research approaches that do not involve the direct manipulation of an independent variable. They describe, measure, and explore relationships between variables without, in themselves, establishing causation.

Spec Mapping

This lesson addresses the following points in AQA A-Level Psychology (7182), Section 4.2 (Research methods):

Observational techniques: naturalistic/controlled, covert/overt, participant/non-participant; behavioural categories; event/time sampling.
Self-report techniques: questionnaires; interviews (structured, unstructured, semi-structured); designing self-report items.
Correlations: analysis of the relationship between co-variables; positive, negative and zero correlations.
Content analysis and thematic analysis (analysis of qualitative data; coding).
Case studies.
Sampling: random, systematic, stratified, opportunity and volunteer; bias and generalisation.
Reliability (test-retest, inter-observer) and validity (internal, external — including ecological and temporal); ways of assessing and improving them.
Ethics (the BPS code) and ways of dealing with ethical issues; peer review; features of a science.

Assessment objectives engaged: AO1 (definitions and features), AO2 (applying methods, sampling and ethics to novel scenarios) and AO3 (evaluating reliability, validity and ethics). These questions are typically AO2/AO3-heavy.

Observations

Observational research involves watching and recording behaviour as it occurs. The categories below are not mutually exclusive — a single study could be, for example, a covert, non-participant, naturalistic, structured observation.

graph TD
    A[Observational techniques] --> B[Setting]
    A --> C[Awareness]
    A --> D[Researcher role]
    A --> E[Structure]
    B --> B1[Naturalistic]
    B --> B2[Controlled]
    C --> C1[Covert]
    C --> C2[Overt]
    D --> D1[Participant]
    D --> D2[Non-participant]
    E --> E1[Structured: behavioural categories]
    E --> E2[Unstructured]

Type	Description	Strength	Limitation
Naturalistic	Behaviour observed in its natural setting, no intervention	High ecological validity; genuine behaviour	Low control; extraneous variables; hard to replicate
Controlled	Behaviour observed in a structured, set-up environment	Greater control; replicable	Lower ecological validity; behaviour may be artificial
Covert	Participants unaware they are observed	Reduces demand characteristics	No informed consent; privacy concerns
Overt	Participants know they are observed	Ethically preferable; consent possible	Behaviour may change (the Hawthorne effect)
Participant	Researcher joins the group	Rich, insider data	Loss of objectivity; "going native"; ethical issues
Non-participant	Researcher observes from outside	More objective; less influence on the group	May miss subtle, context-dependent behaviour
Structured	Uses pre-set behavioural categories and sampling	Systematic, quantitative; better inter-observer reliability	May miss unanticipated behaviours
Unstructured	Records all relevant behaviour, no framework	Captures richness and complexity	Hard to analyse; observer bias; lower reliability

Behavioural Categories and Sampling

To make observation systematic, researchers operationalise behaviour into behavioural categories — a checklist of clearly defined, observable, mutually exclusive actions (e.g. for "aggression": hits, kicks, pushes, shouts). Good categories are objective (require no inference — "hits another child" rather than "is being mean"), mutually exclusive (a single act falls into only one category), and exhaustive enough to capture the behaviour of interest. They then decide how to sample behaviour over time:

Event sampling — counting every occurrence of a target behaviour throughout the observation. Best when the behaviour is relatively infrequent, but it can overwhelm the observer if many things happen at once.
Time sampling — recording behaviour only at fixed intervals (e.g. what is happening every 30 seconds). This keeps the workload manageable for frequent behaviour, but may miss behaviours that occur between the sampled moments, reducing validity.

For example, an observer studying playground aggression might use event sampling during a 20-minute break, tallying each instance of hitting, kicking, pushing or shouting against the agreed categories — a concrete procedure that also makes inter-observer reliability checkable.

The categories of observation interact in practice. A study might, for instance, be covert and participant and naturalistic — as in a researcher who secretly joins a religious group to observe it in its everyday setting. Each choice carries its own trade-off: covert observation gains natural behaviour but loses informed consent; participant observation gains insider depth but risks the researcher "going native" and losing objectivity; naturalistic observation gains ecological validity but loses control. Strong evaluation discusses these combined implications rather than treating each label in isolation.

Inter-Observer Reliability

When two or more observers record the same behaviour, they should agree. Inter-observer reliability is assessed by comparing their records. A simple percentage-agreement measure is

$\text{Agreement \%} = \frac{\text{number of agreements}}{\text{total number of observations}} \times 100$

A figure of 80% or above is generally considered acceptable. More precisely, agreement can be expressed as a correlation between the two observers' tallies, with a coefficient of $r \geq +0.80$ taken as good reliability. If agreement is low, behavioural categories must be redefined and observers retrained.

Exam Tip: When evaluating any observation, discuss inter-observer reliability and how to improve it — operationalise categories clearly, train observers together, and use video so records can be re-checked.

Self-Report Methods

Self-report involves asking participants to report their own thoughts, feelings, attitudes or behaviour, via questionnaires or interviews.

Questionnaires

A questionnaire is a pre-set list of written items. Items may be:

Closed questions — fixed response options (yes/no, Likert scales, ranking). Produce quantitative data that is easy to analyse and compare across large samples, but which restricts respondents to the options provided.
Open questions — answered in the respondent's own words. Produce qualitative data — rich, detailed and able to capture unexpected responses, but harder to analyse and compare.

The choice of question type is therefore a trade-off between breadth and standardisation (closed) and depth and authenticity (open), and many questionnaires deliberately mix the two: closed items for the variables of central interest, plus a few open items to capture nuance.

Strengths	Limitations
Reach large samples quickly and cheaply	Social desirability bias — answers people think are acceptable rather than true
Easy to replicate (standardised format)	Acquiescence bias — tendency to agree regardless of content
Closed-question quantitative data allow statistical analysis	Respondents may rush or misinterpret items with no one to clarify
Anonymity may improve honesty	Closed questions restrict depth

Interviews

Type	Description	Strength	Limitation
Structured	Fixed questions in a set order	Standardised; replicable; easy to compare	Inflexible; may miss important points
Unstructured	Conversation develops freely	Rich, detailed, flexible	Hard to replicate/analyse; interviewer bias
Semi-structured	Set questions plus follow-ups	Balances structure and flexibility	Some loss of comparability; skill-dependent

Key Definition: Social desirability bias occurs when participants give answers that present them in a favourable or socially acceptable light, rather than truthful ones.

Designing Good Self-Report Items

The AQA specification expects you to be able to design self-report materials, so it is worth knowing what separates a good item from a poor one. Good questionnaire and interview questions are:

Clear and unambiguous — no jargon, no double negatives, and no double-barrelled questions (which ask two things at once, e.g. "Do you find your job stressful and poorly paid?" — a participant cannot answer if only one applies).
Unbiased / non-leading — they must not steer the respondent towards an answer (e.g. "How annoying is the new policy?" presupposes annoyance).
Appropriately formatted — closed items (Likert scales, rating scales, fixed choices) for easily analysed quantitative data; open items where depth and the respondent's own words matter.
Sensitively sequenced — sensitive questions placed later, and filler questions included to disguise the aim and reduce demand characteristics.

A Likert scale (e.g. strongly agree → strongly disagree) is the most common closed format; note that the data it yields are ordinal, because the psychological distance between "agree" and "strongly agree" is not guaranteed to equal that between "neutral" and "agree".

Exam Tip: Good self-report items are clear (no jargon, no double-barrelled questions), unbiased (no leading questions), and use filler questions to disguise the aim. When evaluating, weigh social desirability bias, the data type (quantitative vs qualitative), depth, and whether a researcher is present.

Correlations

A correlation measures the strength and direction of the relationship between two co-variables. It is a technique of analysis, not strictly a method — the data can be gathered by questionnaire, observation or archival record.

Key Definition: A correlation is a measure of the relationship between two co-variables. A positive correlation means both rise together; a negative correlation means one rises as the other falls; a zero correlation means there is no systematic relationship.

Type	Description	Scattergram pattern
Positive	Both co-variables increase together (e.g. study hours and exam marks)	Points slope up, left to right
Negative	One rises as the other falls (e.g. stress and immune function)	Points slope down, left to right
Zero	No systematic relationship	Points scattered randomly

The Correlation Coefficient

Strength is expressed as a correlation coefficient, a number on the scale

$-1.0 \leq r \leq +1.0$

where $+1.0$ is a perfect positive relationship, $0$ is no relationship, and $-1.0$ is a perfect negative relationship. The closer to $\pm 1$ , the stronger the association: roughly, $|r| > 0.7$ is strong, $0.3$ – $0.7$ moderate, and $< 0.3$ weak. Crucially, a negative coefficient is not a weak one — $-0.85$ is a strong (negative) relationship.

Strengths	Limitations
Identify patterns and the strength/direction of a relationship	Cannot establish causation — it does not show one variable changes the other
Useful for generating hypotheses to test experimentally	Third-variable problem — an unseen variable may drive both co-variables
Can analyse variables that cannot be ethically manipulated	Often misinterpreted as proving causation
Quick using secondary data	Detects only linear trends — may miss curvilinear relationships

Correlations vs Experiments, and the Direction-of-Causality Problem

It is essential to be clear about why correlations cannot establish cause and effect, because exams test it directly. There are two distinct reasons. First, the third-variable problem: a third, unmeasured variable may be driving both co-variables (the classic illustration is that ice-cream sales correlate with drowning, but temperature raises both). Second, even where the two co-variables are genuinely related, a correlation cannot tell us the direction of causality — if stress and poor sleep correlate, does stress disrupt sleep, or does poor sleep cause stress, or both? Only an experiment, which manipulates one variable and controls the rest, can resolve these. This is exactly why correlational findings are so often used to generate hypotheses that are then tested experimentally: the correlation flags a relationship worth investigating, and the experiment establishes whether it is causal.

It is also worth noting that a correlation coefficient near zero indicates only the absence of a linear relationship; a strong curvilinear relationship (such as the inverted-U linking arousal and performance in the Yerkes–Dodson law) could still exist and would be missed by a simple correlation, which is a further limitation of the technique.

Exam Tip: "Correlation does not equal causation" must appear in every correlation answer, illustrated by the third-variable problem — e.g. ice-cream sales correlate with drowning, but the third variable is temperature (hot weather raises both). For the top band, add the direction-of-causality problem and the point that correlations detect only linear relationships.

Content Analysis and Thematic Analysis

These techniques turn qualitative material (interview transcripts, diaries, media) into analysable data.

Content analysis is a form of indirect observation of communication. The researcher devises coding units (e.g. counting how often a word or image type appears) and tallies their frequency, producing quantitative data from qualitative sources. Coding can be checked for inter-rater reliability.
Thematic analysis keeps the data qualitative: the researcher identifies recurring themes (ideas or patterns of meaning) across the material and uses quotations to illustrate them. The process is iterative — the researcher reads and re-reads the material, codes segments, and groups codes into broader themes — and the credibility of the analysis rests on staying faithful to the participants' own words.

A worked illustration of content analysis: to study gender stereotyping in advertising, a researcher might code a sample of magazine adverts for whether the central figure is shown in a "domestic" or "professional" role, then count the frequencies for male and female figures. The qualitative source (the adverts) is thereby converted into quantitative data that can be compared and even tested statistically — and, because the coding scheme is explicit, a second coder can check inter-rater reliability.

Strengths	Limitations
High ecological validity — uses real communication	Researcher bias — coding/themes reflect the analyst's interpretation
Easily replicable when coding units are explicit (content analysis)	Material is decontextualised, so meaning may be lost
Can study sensitive topics without direct contact with participants	Thematic analysis is hard to replicate and is subjective

Case Studies

A case study is a detailed, in-depth investigation of a single individual, group, institution or event, usually drawing on several methods (interviews, observation, tests, records) and producing rich, mainly qualitative data over time.

Key Definition: A case study is an intensive, detailed investigation of a single case, typically using multiple methods over an extended period and often longitudinal.

Examples in psychology: Phineas Gage (frontal-lobe damage and personality change); HM / Henry Molaison, who developed severe anterograde amnesia after bilateral hippocampal removal, providing key evidence for the hippocampus in memory (Scoville & Milner, 1957); Little Hans (Freud, 1909), used to support psychoanalytic theory.

Strengths	Limitations
Rich, in-depth data capturing real complexity	Cannot generalise from a single case
Can study rare/unique cases not recreatable experimentally	Researcher bias — subjective, selective interpretation
Often longitudinal — tracks change over time	Retrospective data may be inaccurate
Can challenge established theory with a single counter-example	Hard to replicate; confidentiality is difficult when the case is identifiable

The case of HM illustrates both the strengths and the limitations vividly. Because HM's amnesia was unique and could never have been produced experimentally on ethical grounds, his case yielded irreplaceable insight into the role of the hippocampus in forming new long-term memories, and it directly challenged the prevailing view that memory was a single, unitary store — a single counter-example forcing theoretical change. Yet his findings rest on one individual with a very specific lesion, so generalisation is uncertain, and much of the data depended on the interpretation of the researchers who worked with him over decades. This is the central tension of the case-study method: unrivalled depth purchased at the cost of generalisability and objectivity, which is why case studies are most powerful when they complement larger, controlled studies rather than standing alone.

Sampling

A sample is the group of participants actually studied, drawn from a target population. The aim is a representative sample so findings can be generalised.

Research Methods: Non-Experimental Methods

Research Methods: Non-Experimental Methods

Spec Mapping

Observations

Behavioural Categories and Sampling

Inter-Observer Reliability

Self-Report Methods

Questionnaires

Interviews

Designing Good Self-Report Items

Correlations

The Correlation Coefficient

Correlations vs Experiments, and the Direction-of-Causality Problem

Content Analysis and Thematic Analysis

Case Studies

Sampling

More in Psychology