You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
This lesson addresses how to evaluate fitness tests, a key skill required by the OCR GCSE PE specification (J587). It is not enough to know the tests — you must also understand whether a test is fit for purpose. OCR examiners regularly ask candidates to discuss the strengths and weaknesses of specific tests using the concepts of validity, reliability, and practicality. You also need to understand how to use normative data to interpret results.
Validity asks: does the test actually measure what it claims to measure?
A test is valid if it accurately assesses the intended component of fitness. A test that measures something different, or only partially measures the target component, has lower validity.
| Test | Validity Consideration |
|---|---|
| MSFT (bleep test) | High validity for cardiovascular endurance — the progressive nature closely mirrors the demands on the cardiorespiratory system during sustained exercise. |
| Grip dynamometer | Limited validity for overall strength — it only measures grip strength, not the strength of the legs, core, or upper body. A strong grip does not guarantee strong legs. |
| Cooper 12-min run | Reasonably valid for cardiovascular endurance, but motivation and pacing strategy also affect the result, which reduces pure validity. |
| Sit and reach test | Valid for hamstring and lower-back flexibility, but does not measure flexibility at other joints (e.g. shoulder, hip). |
Exam Tip: When asked to evaluate a test, always state what the test measures and then discuss whether it fully captures that component. For example: "The grip dynamometer has limited validity as a measure of overall strength because it only tests one muscle group in the hand and forearm."
Reliability asks: would the test produce the same results if it were repeated under the same conditions?
A test is reliable if it gives consistent results when repeated. Factors that reduce reliability include:
| Factor | How It Reduces Reliability |
|---|---|
| Different equipment | Using a different grip dynamometer may give a different reading |
| Different conditions | Running the Cooper test on a windy day versus a calm day |
| Different time of day | A performer may be more fatigued if tested in the evening |
| Inconsistent procedures | One tester allowing a longer warm-up than another |
| Motivation levels | A performer who tries harder on one occasion than another |
| Human error | Timing inaccuracies when using a stopwatch instead of electronic gates |
graph TD
R["Reliability"] --> S["Standardise<br>procedures"]
R --> E["Use electronic<br>measurement"]
R --> M["Multiple<br>attempts"]
R --> C["Control<br>environment"]
style R fill:#8e44ad,color:#fff
style S fill:#2980b9,color:#fff
style E fill:#2980b9,color:#fff
style M fill:#2980b9,color:#fff
style C fill:#2980b9,color:#fff
Practicality asks: how easy is the test to carry out in a real-world setting?
A practical test is one that is simple, inexpensive, and quick to administer. Factors affecting practicality include:
| Factor | Explanation |
|---|---|
| Cost | Does the test require expensive equipment? The ruler drop test is very cheap; a VO2 max lab test is very expensive. |
| Time | How long does it take? The 30 m sprint takes seconds; the Cooper 12-min run takes much longer. |
| Equipment | Is specialised equipment needed? A sit and reach box is simple; a body composition DEXA scan requires laboratory equipment. |
| Space | How much room is needed? The MSFT needs 20 m of flat space; the Illinois agility test needs 10 m × 5 m. |
| Expertise | Does the tester need specialist training? Most GCSE-level tests require only basic instruction. |
| Number of performers | Can large groups be tested simultaneously? The MSFT can test many people at once; the 1RM test is one performer at a time. |
Normative data are sets of scores from large populations that have been categorised by age and gender. They allow you to compare an individual's result to a standard.
| Rating | Level Achieved |
|---|---|
| Excellent | 12.0+ |
| Above Average | 10.0–11.9 |
| Average | 8.0–9.9 |
| Below Average | 6.0–7.9 |
| Poor | Below 6.0 |
Note: These are illustrative values. The exact normative data tables vary between sources.
| Criterion | Evaluation |
|---|---|
| Validity | High — progressive aerobic test closely mirrors the demands on the cardiovascular system. However, motivation and running technique can influence results. |
| Reliability | Reasonably high — the audio recording standardises the pace. However, surface type, temperature, and footwear can vary. |
| Practicality | Very practical — requires minimal equipment (cones, audio file, 20 m space), can test large groups simultaneously. |
| Criterion | Evaluation |
|---|---|
| Validity | Moderate — it measures a visual reaction to a single stimulus, but in sport, reactions involve multiple stimuli (auditory, peripheral vision) and whole-body movements, not just finger movements. |
| Reliability | Moderate — the partner dropping the ruler may give unintentional cues (e.g. finger movement before release), and the performer's alertness may vary between trials. |
| Practicality | Very practical — requires only a ruler, can be done anywhere, takes minimal time. |
| Criterion | Evaluation |
|---|---|
| Validity | High — directly measures the maximum force a muscle group can produce in one contraction, which is the definition of maximal strength. |
| Reliability | Can vary — depends on warm-up, fatigue, time of day, and the specific equipment used. Standardising conditions improves reliability. |
| Practicality | Less practical — requires access to weight-training equipment, can only test one person at a time, and carries a risk of injury if the performer attempts too heavy a weight. |
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.