Data Handling and Analysis

Once data have been collected, they must be organised, summarised, and analysed in order to draw meaningful conclusions. At A-Level, you need to understand the distinction between different types of data, measures of central tendency and dispersion, methods of data presentation, distributions, and levels of measurement. These concepts are fundamental to evaluating research findings and conducting your own investigations.

Key Definition: Data analysis is the process of organising, summarising, and interpreting collected data to identify patterns, draw conclusions, and evaluate hypotheses.

Types of Data

Quantitative vs Qualitative Data

Type	Description	Example	Strengths	Limitations
Quantitative	Numerical data that can be measured and analysed statistically	Reaction times (ms), test scores, number of errors	Objective; easy to analyse; allows statistical testing; comparisons between groups	May lack depth; may oversimplify complex behaviours
Qualitative	Non-numerical data expressed in words, descriptions, or themes	Interview transcripts, diary entries, open-ended survey responses	Rich and detailed; captures the meaning and complexity of behaviour	Subjective; difficult to analyse and compare; researcher interpretation bias

Exam Tip: Many studies collect both types of data. For example, a study on stress might measure cortisol levels (quantitative) and also interview participants about their experience (qualitative). Combining both is called methodological triangulation and increases the validity of the research.

Primary vs Secondary Data

Type	Description	Strengths	Limitations
Primary data	Data collected directly by the researcher for the specific purpose of the study	Directly relevant to the research question; researcher has control over collection methods	Time-consuming; expensive; may have small sample sizes
Secondary data	Data that has already been collected by someone else for a different purpose (e.g., government statistics, published studies, medical records)	Large datasets readily available; cost-effective; can be used for longitudinal analysis	May not be directly relevant; researcher has no control over how data were collected; may be outdated

Measures of Central Tendency

Measures of central tendency summarise a dataset by identifying the most "typical" or "central" value:

Measure	Calculation	Strengths	Limitations
Mean	Sum of all values ÷ number of values	Uses all data points; most sensitive and powerful measure; suitable for interval/ratio data	Distorted by extreme values (outliers); may produce a value that does not exist in the dataset (e.g., 2.4 children)
Median	The middle value when data are arranged in order (for even-numbered datasets, the mean of the two middle values)	Not affected by outliers; good for skewed distributions; easy to calculate	Does not use all data points; less sensitive than the mean; can be less representative of the full dataset
Mode	The most frequently occurring value	Can be used with nominal (categorical) data; represents an actual data value; easy to identify	May not be representative; a dataset may have multiple modes (bimodal/multimodal) or no mode at all; not useful for further statistical analysis

Key Definition: The mean is the arithmetic average, calculated by dividing the sum of all values by the number of values.

Worked Example:

Data: 3, 5, 7, 8, 8, 10, 12, 14, 47

Mean = (3 + 5 + 7 + 8 + 8 + 10 + 12 + 14 + 47) ÷ 9 = 114 ÷ 9 = 12.67
Median = the 5th value in order = 8
Mode = 8 (appears twice)

Notice that the mean (12.67) is pulled upward by the extreme value of 47, making it unrepresentative of the majority of the data. In this case, the median (8) provides a better summary.

Exam Tip: If a dataset contains outliers or is skewed, recommend the median as the most appropriate measure of central tendency. If the data are normally distributed and measured on an interval or ratio scale, the mean is preferred because it uses all data points.

Measures of Dispersion

Measures of dispersion describe how spread out the data are:

Measure	Calculation	Strengths	Limitations
Range	Highest value − lowest value (sometimes +1 is added)	Quick and easy to calculate	Only uses two data points; heavily influenced by outliers; ignores the distribution of data between extremes
Standard deviation (SD)	A measure of the average amount by which each data point differs from the mean	Uses all data points; more precise and informative than the range; essential for further statistical analysis	More difficult to calculate; affected by outliers (though less so than the range); requires interval/ratio data

Key Definition: The standard deviation is a measure of dispersion that indicates the average distance of each data point from the mean. A small SD indicates data points are clustered close to the mean; a large SD indicates they are widely spread out.

Interpreting standard deviation:

A small SD means the data are tightly clustered around the mean — participants responded similarly, and the mean is a good representation of the dataset.
A large SD means the data are widely spread — there is high variability between participants, and the mean may be less representative.

Worked Example:

Data: 4, 6, 7, 8, 10

Data Handling and Analysis

Data Handling and Analysis

Types of Data

Quantitative vs Qualitative Data

Primary vs Secondary Data

Measures of Central Tendency

Measures of Dispersion

More in Psychology