AQA A-Level Maths: Working with the Large Data Set
6 exam-style questions with full mark schemes and model answers. Write your own answer and the AI examiner marks it against the mark scheme.
A weather station records, on 10 days during one summer, the daily number of hours of sunshine x and the daily maximum air temperature y (in ∘C). The figures below are representative, illustrative data written for this exercise in the style of the kind of daily weather-station records used in the AQA large data set; they are not the actual large data set.
| Day | Sunshine, x (hours) | Max temperature, y (∘C) |
|---|---|---|
| 1 | 3 | 14 |
| 2 | 4 | 16 |
| 3 | 5 | 18 |
| 4 | 5 | 18 |
| 5 | 6 | 19 |
| 6 | 7 | 19 |
| 7 | 7 | 20 |
| 8 | 8 | 22 |
| 9 | 9 | 24 |
| 10 | 6 | 20 |
You may assume the following summary statistics, where Sxx=Σx2−n(Σx)2 etc.:
Σx=60,Σy=190,Σx2=390,Σy2=3682,Σxy=1185.
(a) Show that Sxx=30 and Sxy=45, and hence calculate the product moment correlation coefficient r for these data. Give your answer to 3 significant figures and interpret it in this context. (5 marks)
(b) Find the equation of the least-squares regression line of y on x in the form y=a+bx, and interpret the value of the gradient b in this context. (4 marks)
(c) The meteorologist wishes to test, at the 5% significance level, whether there is positive correlation between the hours of sunshine and the daily maximum temperature. The critical value for a one-tailed test at the 5% level with n=10 is 0.5494. Carry out the test, and separately comment on the reliability of using your regression line to predict the maximum temperature on a day with 13 hours of sunshine. (3 marks)
The daily rainfall, r mm, was recorded at weather station Aldermere on each of 50 days during one autumn. The grouped frequency table below shows the results. (These are representative, illustrative data written for this exercise in the style of the AQA large data set; they are not the real data set.)
| Rainfall, r (mm) | Frequency, f |
|---|---|
| 0≤r<2 | 7 |
| 2≤r<4 | 19 |
| 4≤r<6 | 19 |
| 6≤r<8 | 3 |
| 8≤r<12 | 2 |
(a) Use the midpoints of the classes to estimate the mean and the standard deviation of the daily rainfall at Aldermere. (4 marks)
(b) Use linear interpolation to estimate the median and the lower and upper quartiles of the daily rainfall. Give each answer to 3 significant figures. (4 marks)
(c) An outlier is defined as any value more than 1.5×IQR above the upper quartile or below the lower quartile. At a second station, Brackenfell, over the same 50 days the median daily rainfall was 5.8 mm with an interquartile range of 3.1 mm. Using your answers, state whether the top class at Aldermere could contain an outlier, and compare the rainfall at the two stations in context. (2 marks)
Over many years, the daily number of hours of sunshine in June at a coastal weather station is known to be modelled by a normal distribution with standard deviation σ=2.4 hours. A local tourism officer claims that, because of recent climate trends, the mean daily number of hours of sunshine in June at this station now exceeds 6.0 hours.
To investigate the claim, a random sample of 36 June days is taken from recent records, and the mean daily sunshine for the sample is found to be xˉ=6.8 hours. It is assumed that the standard deviation is unchanged at 2.4 hours. (The values are illustrative figures written for this exercise, in the style of the large data set; they are not the real data set.)
Test the tourism officer's claim at the 5% significance level. The critical value for a one-tailed test at the 5% level is z=1.6449. State your hypotheses clearly, and give your conclusion in context.
At an inland weather station, records suggest that, in a typical month, the long-run proportion of days on which measurable rain falls (a "rain day") is 0.2. A month with 30 days is selected, and the number of rain days, X, in that month is to be modelled by a binomial distribution.
(a) Write down the distribution of X, and find P(X=6), giving your answer to 4 decimal places. (3 marks)
(b) Find the probability that the month contains at least 9 rain days. (2 marks)
(c) State one reason why, for real weather data, the binomial model in part (a) may not be fully appropriate. (1 mark)
A student downloads the daily mean wind speed, w knots, recorded at a weather station on 9 consecutive days, intending to find the mean daily wind speed for the period. The raw values copied into a spreadsheet are shown below. (These are illustrative figures written for this exercise, in the style of the large data set.)
| Day | Mon | Tue | Wed | Thu | Fri | Sat | Sun | Mon | Tue |
|---|---|---|---|---|---|---|---|---|---|
| Wind speed, w (knots) | 18 | 21 | 15 | n/a | 22 | 24 | 20 | 205 | 20 |
(a) Identify the two entries that should not be used as recorded, giving a reason for each. (2 marks)
(b) Using only the valid readings, calculate an appropriate estimate of the mean daily wind speed for these days. (2 marks)
(c) State one limitation of reporting this mean as a summary of the wind speed for the period. (1 mark)
The daily maximum temperatures, c∘C, recorded at a weather station over one month have mean cˉ=19∘C and standard deviation 3∘C.
For an overseas report, each temperature is converted to degrees Fahrenheit, f, using the coding
f=1.8c+32.
Find the mean and the standard deviation of the daily maximum temperatures in degrees Fahrenheit.