You are viewing a free preview of this lesson.
Subscribe to unlock all 12 lessons in this course and every other course on LearningBro.
This lesson covers how to evaluate the quality and usefulness of fitness tests, as required by the Edexcel GCSE PE specification (1PE0). You must understand the concepts of validity, reliability, practicality and the use of normative data when assessing whether a fitness test is fit for purpose.
Not all fitness tests are equally useful. Before relying on the results of a test, a coach or performer should consider whether the test actually measures what it claims to, whether the results can be trusted, and whether the test is practical to carry out.
Definition: The degree to which a test measures what it claims to measure.
A valid test accurately reflects the component of fitness being assessed.
| Example | Validity Assessment |
|---|---|
| The Cooper 12-min run measures cardiovascular endurance | High validity — running for 12 minutes directly tests the heart and lungs' ability to supply oxygen |
| Using a grip dynamometer to measure overall body strength | Low validity — grip strength only measures hand/forearm strength, not whole-body strength |
| The sit and reach test measures hamstring and lower-back flexibility | Moderate validity — it only measures flexibility at one joint area, not overall flexibility |
Exam Tip: If a test only measures one aspect of a broad component, its validity for that overall component is reduced. For example, the sit and reach test is valid for hamstring flexibility but not valid for shoulder flexibility.
Definition: The degree to which a test produces consistent, repeatable results under the same conditions.
A reliable test gives similar results when repeated by the same person under the same conditions.
Factors that affect reliability:
| Factor | How It Affects Reliability |
|---|---|
| Standardised procedures | If the test is carried out the same way every time (same equipment, same instructions), reliability is higher |
| Environmental conditions | Temperature, wind, surface and time of day should be consistent |
| Calibrated equipment | Equipment must be checked and standardised (e.g. dynamometer calibrated to zero) |
| Human error | If a partner operates the stopwatch, reaction time in starting/stopping can vary |
| Performer's state | Fatigue, motivation, illness, time since last meal — all affect results |
Example: If a performer completes the bleep test on Monday and scores Level 9.4, then repeats it on Tuesday under the same conditions and scores Level 9.3, the test has high reliability (the results are consistent). If the score dropped to Level 6.2, the test (or conditions) would be unreliable.
Definition: How easy, affordable and feasible a test is to carry out.
| Factor | Questions to Ask |
|---|---|
| Cost | Is the equipment expensive? Can a school afford it? |
| Equipment | Is specialist equipment needed? Is it readily available? |
| Time | How long does the test take? Can it be completed in a single lesson? |
| Space | Is a large area needed? Is it available? |
| Expertise | Does the tester need specialist training to administer the test? |
| Number of participants | Can only one person be tested at a time, or can groups be tested? |
| Test | Practicality |
|---|---|
| Ruler drop test | Very high — only needs a ruler, can be done anywhere, takes seconds |
| Skinfold callipers | Lower — requires trained tester, specialist callipers, and privacy |
| Bleep test | Moderate — needs 20 m space, audio equipment, but many performers can be tested at once |
| VO2 max lab test | Very low — requires expensive lab equipment, trained technicians |
Definition: A set of average or expected results for a specific population (e.g. by age and gender) against which an individual's test result can be compared.
Normative data allows you to rate a result as excellent, above average, average, below average or poor.
| Why Normative Data Is Useful | Explanation |
|---|---|
| Benchmarking | Tells the performer where they stand compared to others of the same age and gender |
| Goal setting | Helps set realistic targets (e.g. "move from average to above average") |
| Monitoring progress | Re-testing and comparing to norms shows improvement over time |
| Identifying strengths and weaknesses | A performer may score "excellent" for speed but "below average" for flexibility |
Limitations of normative data:
Exam Tip: Normative data is only useful if it is relevant to the person being tested. Comparing a 15-year-old's results to data from 25-year-old university students would be misleading.
| Test | Validity | Reliability | Practicality |
|---|---|---|---|
| Bleep test | High — directly tests CV endurance | High — standardised audio and protocol | Moderate — needs 20 m space and audio |
| 35 m sprint | High — directly tests speed | Moderate — wind and surface affect results | High — needs only tape measure and stopwatch |
| BMI | Low — does not distinguish fat from muscle | High — calculation is consistent | Very high — only needs scales and tape measure |
| Skinfold callipers | Moderate-high — measures body fat directly | Moderate — tester skill affects accuracy | Low — requires trained tester and privacy |
| Ruler drop test | Moderate — measures reaction time but involves motor skill (catching) | Moderate — partner's release and performer's attention vary | Very high — needs only a ruler |
Subscribe to continue reading
Get full access to this lesson and all 12 lessons in this course.