Research Methods & Mathematical Skills

Research methods is the most heavily examined single strand of AQA A-Level Psychology, yet it is the one students most often under-prepare. The Research Methods section on Paper 2 alone is worth 48 marks — half the paper — and methods questions are also embedded in every section of Papers 1 and 3. On top of this, AQA mandates that at least 10% of all marks assess mathematical skills, and those marks are overwhelmingly clustered here. The good news is that research-methods and maths marks are the most predictable on the whole qualification: the questions recur in fixed forms ("identify the design", "justify a statistical test", "calculate the percentage", "carry out a sign test"), and the technique for each is learnable. This lesson teaches that technique with worked numerical examples and KaTeX formulae, plus a decision tree for choosing statistical tests and banded model answers showing what full-mark working looks like.

Spec Mapping

Skill cluster	AO	Where assessed
Experimental methods, designs, sampling	AO1 / AO2 / AO3	Paper 2 Section C; embedded in Papers 1 & 3
Descriptive statistics & data handling	AO2 (maths)	Paper 2 Section C; data items anywhere
Choosing & justifying inferential tests	AO2	Paper 2 Section C
Probability, significance, Type I/II errors	AO1 / AO2	Paper 2 Section C
The sign test (the one you must calculate)	AO2 (maths)	Paper 2 Section C
Percentages, fractions, ratios, SD, sig figs	AO2 (maths)	Minimum 10% of marks across all papers
Ethics, peer review, scientific process	AO1 / AO3	All papers

Key Point: Treat research methods as a cross-cutting skill, not a Paper 2 silo. A "Design a study to investigate conformity" question can appear in the Social Influence section of Paper 1; a "justify your statistical test" question can appear inside a Paper 3 option.

Experimental Methods

Type	Description	Strengths	Limitations
Laboratory experiment	Controlled environment; researcher manipulates the IV, measures the DV	High control; easy to replicate; can establish cause and effect	Low ecological validity; demand characteristics; artificial
Field experiment	Natural setting; researcher still manipulates the IV	Higher ecological validity; more natural behaviour	Less control; harder to replicate; ethical issues (no consent)
Natural experiment	The IV occurs naturally (not manipulated); the DV is measured	Allows study of variables unethical/impractical to manipulate (e.g. institutionalisation)	No cause and effect; confounding variables
Quasi-experiment	IV is an existing participant characteristic (age, gender, diagnosis)	Allows comparison of pre-existing groups	No random allocation; differences may be confounds

Experimental Designs

The design determines how participants are allocated to conditions.

Design	Each participant...	Strengths	Limitations	Fix
Independent groups	takes part in ONE condition	No order effects; less likely to guess aim	Individual differences confound; needs more participants	Random allocation
Repeated measures	takes part in ALL conditions	Controls individual differences; fewer participants	Order effects; demand characteristics	Counterbalancing (ABBA)
Matched pairs	is paired then split across conditions	Reduces individual differences; no order effects	Time-consuming; cannot match on everything	Match on the most relevant variable

Worked technique — "Identify the design": Read the procedure and ask one question — did the same people do both conditions? If yes, it is repeated measures (or, if pre-paired, matched pairs); if different people did each condition, it is independent groups. State the design and one justification drawn from the stem.

Sampling Methods

The sample is drawn from the target population; the method determines representativeness.

Method	How it works	Strengths	Limitations
Random	Every member has an equal chance (names from a hat, random-number generator)	Free from researcher bias; likely representative	Time-consuming; selected people may decline
Systematic	Every nth person on a list	Objective; easy	A periodic pattern in the list can bias it
Stratified	Population split into strata; randomly sampled in proportion	Highly representative of key characteristics	Very time-consuming; needs detailed population data
Opportunity	Whoever is available and willing	Quick, easy, cheap	Highly biased; over-represents one group
Volunteer	Participants self-select (advert)	Willing participants; good for sensitive topics	Volunteer bias (more motivated/extravert)

Exam Tip: When evaluating a sampling method, always address (a) representativeness of the target population and (b) any systematic bias the method introduces.

Types of Data

Distinction	Type A	Type B
Quantitative vs Qualitative	Quantitative: numerical; analysed statistically; objective but may lose depth	Qualitative: words/themes; rich and detailed but subjective and hard to replicate
Primary vs Secondary	Primary: collected first-hand for this study; tailored but costly	Secondary: pre-existing (stats, records); cheap but may not fit the question

Non-Experimental Methods

Not every study is an experiment. AQA examines several non-experimental techniques, and a common question asks you to evaluate or design one.

Observational Techniques

Type	Meaning	Note
Naturalistic vs controlled	In a natural setting vs a structured environment	Naturalistic = high ecological validity, low control
Covert vs overt	Participants unaware vs aware they are observed	Covert reduces demand characteristics but raises ethics (consent)
Participant vs non-participant	Observer joins the group vs stays apart	Participant gives insight but risks losing objectivity

Observations are made systematic using behavioural categories (clearly operationalised, observable actions) and sampling methods such as event sampling (count each occurrence of a behaviour) or time sampling (record what is happening at fixed intervals). Inter-observer reliability — agreement between two observers — is checked by correlating their records.

Self-Report Techniques

Questionnaires and interviews (structured, semi-structured, unstructured) gather data directly from participants. Strengths include access to large samples (questionnaires) and rich detail (unstructured interviews); limitations include social desirability bias and response sets. Good design avoids leading questions, double-barrelled questions and jargon.

Correlations

A correlation examines the relationship between two co-variables; it does not manipulate an IV, so it cannot establish cause and effect. The strength and direction are summarised by a correlation coefficient between $-1$ and $+1$ : a value near $+1$ is a strong positive correlation, near $-1$ a strong negative correlation, and near $0$ no correlation. The classic evaluation point is the third-variable problem — an apparent relationship between two co-variables may be driven by an unmeasured third factor.

Reliability and Validity

These two concepts are examinable in their own right and provide ready-made AO3 across the whole course.

Concept	Question it answers	Types / checks
Reliability	Is the measure consistent?	Test-retest (same test, two occasions); inter-observer (two observers agree)
Validity	Does it measure what it claims?	Internal (free of confounds); external (ecological, population, temporal); face and concurrent validity

Ways to improve reliability include standardising procedures, operationalising behavioural categories and training observers. Ways to improve validity include controlling extraneous variables, using a control group, and checking a new measure against an established one (concurrent validity).

Descriptive Statistics (with Worked Maths)

Measures of Central Tendency

Measure	Method	Strength	Limitation
Mean	sum ÷ number of values	Uses all data; most sensitive	Distorted by outliers
Median	middle value when ordered	Unaffected by outliers	Ignores most data
Mode	most frequent value	Works for categorical data	May not exist / may be multiple

The mean of a data set is defined as:

$\bar{x} = \frac{\sum x}{n}$

where $\sum x$ is the sum of all scores and $n$ is the number of scores. For the data set 4, 7, 7, 9, 13:

$\bar{x} = \frac{4 + 7 + 7 + 9 + 13}{5} = \frac{40}{5} = 8$

The median is the middle value (7) and the mode is the most frequent value (7).

Measures of Dispersion

Measure	Method	Strength	Limitation
Range	highest − lowest (sometimes +1)	Quick	Uses only two values; sensitive to outliers
Standard deviation	average distance of scores from the mean	Uses all data; precise	Harder to compute; affected by outliers

The standard deviation measures the average spread of scores around the mean. One common form is:

$\sigma = \sqrt{\frac{\sum (x - \bar{x})^2}{n}}$

Interpreting it is the examinable skill: a large standard deviation means scores are widely spread out from the mean (more variability); a small standard deviation means scores cluster tightly around the mean (more consistency). For the data above ( $\bar{x} = 8$ ), the squared deviations are $16, 1, 1, 1, 25$ , giving:

$\sigma = \sqrt{\frac{44}{5}} = \sqrt{8.8} \approx 2.97$

Exam Tip: You must be able to calculate the mean, median, mode and range. For standard deviation, AQA expects you to understand and interpret it (high SD = spread out; low SD = clustered) and to recognise the formula, rather than to compute it unaided under exam conditions.

Statistical Tests: Choosing the Right One

Choosing a test depends on three questions, asked in order:

Hypothesis — difference or correlation?
Design — related (repeated measures / matched pairs) or unrelated (independent groups)?
Level of measurement — nominal, ordinal or interval?

Decision Tree

graph TD
    A[What is the hypothesis?] --> B[Test of Difference]
    A --> C[Test of Correlation]
    B --> D{Related or Unrelated?}
    D --> E[Related Design]
    D --> F[Unrelated Design]
    E --> G{Level of Data?}
    F --> H{Level of Data?}
    G --> I[Nominal: Sign Test]
    G --> J[Ordinal: Wilcoxon]
    G --> K[Interval: Related t-test]
    H --> L[Nominal: Chi-squared]
    H --> M[Ordinal: Mann-Whitney U]
    H --> N[Interval: Unrelated t-test]
    C --> O{Level of Data?}
    O --> P[Ordinal: Spearman's rho]
    O --> Q[Interval: Pearson's r]

Summary Table

Test	Difference or Correlation	Design	Level of Data
Sign test	Difference	Related	Nominal
Wilcoxon signed-rank	Difference	Related	At least ordinal
Related t-test	Difference	Related	Interval
Chi-squared	Difference / association	Unrelated	Nominal
Mann-Whitney U	Difference	Unrelated	At least ordinal
Unrelated t-test	Difference	Unrelated	Interval
Spearman's rho	Correlation	N/A	At least ordinal
Pearson's r	Correlation	N/A	Interval

A useful mnemonic for the difference tests, reading down the related column then the unrelated column, is "Carrots Should Come Mashed With Swede Under Roast Potatoes" — but the decision tree is more reliable than any rhyme.

Levels of Measurement

Level	Description	Example
Nominal	Named categories; frequency counts	How many chose A vs B
Ordinal	Can be ranked; unequal intervals	Race positions; Likert ratings
Interval	Equal intervals on a scale	Time in seconds; standardised test scores

Exam Tip: The classic question — "Identify a suitable statistical test and justify your choice" — wants the test plus three justifications: the hypothesis (difference/correlation), the design (related/unrelated), and the level of data (nominal/ordinal/interval). Drop any one and you cap the marks.

Probability and Significance

The Significance Level

The conventional significance level in psychology is $p \leq 0.05$ (5%). This means:

There is a 5% or smaller probability that the result occurred by chance (if the null hypothesis were true).
If $p \leq 0.05$ , the result is statistically significant and the null hypothesis is rejected.
If $p > 0.05$ , the result is not significant and the null hypothesis is retained.

A more stringent level such as $p \leq 0.01$ is used where a Type I error would be especially costly (for example, in drug research).

Type I and Type II Errors

Error	What happens	More likely when	Nickname
Type I	Reject a true null hypothesis (false positive)	Significance level too lenient (e.g. $p \leq 0.10$ )	Optimistic error
Type II	Retain a false null hypothesis (false negative)	Significance level too stringent (e.g. $p \leq 0.01$ )	Pessimistic error

Key Point: $p \leq 0.05$ is a deliberate compromise. Relaxing to $p \leq 0.10$ raises the Type I risk; tightening to $p \leq 0.01$ raises the Type II risk. There is no level that minimises both at once.

Comparing the Calculated and Critical Value

For most tests you compare a calculated value with a critical value from a table (using $N$ , the significance level, and whether the hypothesis is one- or two-tailed). The rule differs by test family, so learn it explicitly:

For the sign test, Wilcoxon and Mann-Whitney, the result is significant when the calculated value is less than or equal to the critical value.
For chi-squared, Spearman's rho and Pearson's r, the result is significant when the calculated value is greater than or equal to the critical value.

Research Methods & Mathematical Skills

Research Methods & Mathematical Skills

Spec Mapping

Experimental Methods

Experimental Designs

Sampling Methods

Types of Data

Non-Experimental Methods

Observational Techniques

Self-Report Techniques

Correlations

Reliability and Validity

Descriptive Statistics (with Worked Maths)

Measures of Central Tendency

Measures of Dispersion

Statistical Tests: Choosing the Right One

Decision Tree

Summary Table

Levels of Measurement

Probability and Significance

The Significance Level

Type I and Type II Errors

Comparing the Calculated and Critical Value

More in Psychology