Skip to content

AQA A-Level Maths: Working with the Large Data Set

6 exam-style questions with full mark schemes and model answers. Write your own answer and the AI examiner marks it against the mark scheme.

Question 112 marksCalculate and interpret

A weather station records, on 101010 days during one summer, the daily number of hours of sunshine xxx and the daily maximum air temperature yyy (in C^{\circ}\text{C}C). The figures below are representative, illustrative data written for this exercise in the style of the kind of daily weather-station records used in the AQA large data set; they are not the actual large data set.

DaySunshine, xxx (hours)Max temperature, yyy (C^{\circ}\text{C}C)
1333141414
2444161616
3555181818
4555181818
5666191919
6777191919
7777202020
8888222222
9999242424
10666202020

You may assume the following summary statistics, where Sxx=Σx2(Σx)2nS_{xx} = \Sigma x^2 - \dfrac{(\Sigma x)^2}{n}Sxx=Σx2n(Σx)2 etc.:

Σx=60,Σy=190,Σx2=390,Σy2=3682,Σxy=1185.\Sigma x = 60, \quad \Sigma y = 190, \quad \Sigma x^2 = 390, \quad \Sigma y^2 = 3682, \quad \Sigma xy = 1185.Σx=60,Σy=190,Σx2=390,Σy2=3682,Σxy=1185.

(a) Show that Sxx=30S_{xx} = 30Sxx=30 and Sxy=45S_{xy} = 45Sxy=45, and hence calculate the product moment correlation coefficient rrr for these data. Give your answer to 333 significant figures and interpret it in this context. (5 marks)

(b) Find the equation of the least-squares regression line of yyy on xxx in the form y=a+bxy = a + bxy=a+bx, and interpret the value of the gradient bbb in this context. (4 marks)

(c) The meteorologist wishes to test, at the 5%5\%5% significance level, whether there is positive correlation between the hours of sunshine and the daily maximum temperature. The critical value for a one-tailed test at the 5%5\%5% level with n=10n = 10n=10 is 0.54940.54940.5494. Carry out the test, and separately comment on the reliability of using your regression line to predict the maximum temperature on a day with 131313 hours of sunshine. (3 marks)

AI examiner · marked against the mark scheme
Question 210 marksEstimate

The daily rainfall, rrr mm, was recorded at weather station Aldermere on each of 505050 days during one autumn. The grouped frequency table below shows the results. (These are representative, illustrative data written for this exercise in the style of the AQA large data set; they are not the real data set.)

Rainfall, rrr (mm)Frequency, fff
0r<20 \leq r < 20r<2777
2r<42 \leq r < 42r<4191919
4r<64 \leq r < 64r<6191919
6r<86 \leq r < 86r<8333
8r<128 \leq r < 128r<12222

(a) Use the midpoints of the classes to estimate the mean and the standard deviation of the daily rainfall at Aldermere. (4 marks)

(b) Use linear interpolation to estimate the median and the lower and upper quartiles of the daily rainfall. Give each answer to 333 significant figures. (4 marks)

(c) An outlier is defined as any value more than 1.5×IQR1.5 \times \text{IQR}1.5×IQR above the upper quartile or below the lower quartile. At a second station, Brackenfell, over the same 505050 days the median daily rainfall was 5.85.85.8 mm with an interquartile range of 3.13.13.1 mm. Using your answers, state whether the top class at Aldermere could contain an outlier, and compare the rainfall at the two stations in context. (2 marks)

AI examiner · marked against the mark scheme
Question 38 marksTest at the 5% level

Over many years, the daily number of hours of sunshine in June at a coastal weather station is known to be modelled by a normal distribution with standard deviation σ=2.4\sigma = 2.4σ=2.4 hours. A local tourism officer claims that, because of recent climate trends, the mean daily number of hours of sunshine in June at this station now exceeds 6.06.06.0 hours.

To investigate the claim, a random sample of 363636 June days is taken from recent records, and the mean daily sunshine for the sample is found to be xˉ=6.8\bar{x} = 6.8xˉ=6.8 hours. It is assumed that the standard deviation is unchanged at 2.42.42.4 hours. (The values are illustrative figures written for this exercise, in the style of the large data set; they are not the real data set.)

Test the tourism officer's claim at the 5%5\%5% significance level. The critical value for a one-tailed test at the 5%5\%5% level is z=1.6449z = 1.6449z=1.6449. State your hypotheses clearly, and give your conclusion in context.

AI examiner · marked against the mark scheme
Question 46 marksFind

At an inland weather station, records suggest that, in a typical month, the long-run proportion of days on which measurable rain falls (a "rain day") is 0.20.20.2. A month with 303030 days is selected, and the number of rain days, XXX, in that month is to be modelled by a binomial distribution.

(a) Write down the distribution of XXX, and find P(X=6)P(X = 6)P(X=6), giving your answer to 444 decimal places. (3 marks)

(b) Find the probability that the month contains at least 999 rain days. (2 marks)

(c) State one reason why, for real weather data, the binomial model in part (a) may not be fully appropriate. (1 mark)

AI examiner · marked against the mark scheme
Question 55 marksCalculate

A student downloads the daily mean wind speed, www knots, recorded at a weather station on 999 consecutive days, intending to find the mean daily wind speed for the period. The raw values copied into a spreadsheet are shown below. (These are illustrative figures written for this exercise, in the style of the large data set.)

DayMonTueWedThuFriSatSunMonTue
Wind speed, www (knots)181818212121151515n/a222222242424202020205205205202020

(a) Identify the two entries that should not be used as recorded, giving a reason for each. (2 marks)

(b) Using only the valid readings, calculate an appropriate estimate of the mean daily wind speed for these days. (2 marks)

(c) State one limitation of reporting this mean as a summary of the wind speed for the period. (1 mark)

AI examiner · marked against the mark scheme
Question 64 marksFind

The daily maximum temperatures, cCc\,^{\circ}\text{C}cC, recorded at a weather station over one month have mean cˉ=19C\bar{c} = 19\,^{\circ}\text{C}cˉ=19C and standard deviation 3C3\,^{\circ}\text{C}3C.

For an overseas report, each temperature is converted to degrees Fahrenheit, fff, using the coding

f=1.8c+32.f = 1.8c + 32.f=1.8c+32.

Find the mean and the standard deviation of the daily maximum temperatures in degrees Fahrenheit.

AI examiner · marked against the mark scheme