For each of the following four scatter graphs, identify the type and strength of any linear correlation shown.
Sketch a scatter graph to show a perfect negative linear correlation between two variables.
Did this page help you?
For each of the following four scatter graphs, identify the type and strength of any linear correlation shown.
Sketch a scatter graph to show a perfect negative linear correlation between two variables.
Did this page help you?
A teacher is interested in the relationship between the number of hours her students spend on a phone per day and the number of hours they spend on a computer. She takes a sample of nine students and records the results in the table below.
Hours spent on a phone per day | 7.6 | 7 | 8.9 | 3 | 3 | 7.5 | 2.1 | 1.3 | 5.8 |
Hours spent on a computer per day | 1.7 | 1.1 | 0.7 | 5.8 | 5.2 | 1.7 | 6.9 | 7.1 | 3.3 |
Did this page help you?
The table below shows data for a sample of 8 people comparing the maximum number of pull-ups they are able to complete, x, with the maximum number of press-ups, y.
Number of pull-ups (x) | 5 | 10 | 8 | 3 | 6 | 8 | 1 | 4 |
Number of press-ups (y) | 24 | 34 | 36 | 18 | 30 | 35 | 11 | 19 |
The equation of the regression line of y on x is y=3x + 9.
Did this page help you?
A class is asked to collect a sample of bivariate data. They collect data on the shoe size, S, and the arm span, A cm, of 20 randomly selected boys from the class.
Explain what is meant by the term ‘bivariate data’.
The class plot the data in a scatter diagram and find the equation of the regression line of A on S to be A=4.5 S + 133. These are both plotted in the diagram below.
Did this page help you?
The following table shows data comparing the length of time a cake was baked for, t minutes, with the mass of the cake once it has cooled, m grams. Each cake in the sample weighed the same before being baked.
t | 37 | 35 | 36 | 31 | 30 | 28 | 36 |
m | 825 | 868 | 812 | 943 | 947 | 997 | 837 |
State which variable is the explanatory (independent) variable and which is the response (dependent) variable.
The equation for the regression line of m on t is m=153119t.
Did this page help you?
Isla is investigating whether the number of deep-fried chocolate bars a person eats has an impact on his or her level of fitness. She takes a sample of 10 people and records how many deep-fried chocolate bars they eat during a month, c, and then times how long it takes them to complete a 100-metre sprint, t seconds, at the end of the month.
She plotted the data in a scatter diagram and found the equation of the regression line of t on c to be t = 5c+12.
Find an estimate for the 100-metre sprint time for a person if they eat:
Describe the type of linear correlation you would expect to see on Isla’s scatter diagram and state which value in the regression equation tells you this.
Did this page help you?
Terrence has collected data comparing how many adverts, A, he sees whilst watching TV for different lengths of time, t hours. With this data, Terrence plotted the scatter diagram shown below.
State, with a reason, whether each of the following equations would be appropriate for the equation of the regression line of A on t:
Did this page help you?
Two liquids are mixed and heated to a particular temperature. The time, in seconds, it takes the two liquids to react is recorded. The scatter diagram below shows the results.
Did this page help you?
A teacher collected the maths and physics test scores of a number of students and drew a scatter diagram to represent this data.
Describe the correlation shown by the scatter diagram, and interpret the correlation in context.
An alternative therapist collected data on his clients’ reported levels of anxiety as well as the number of trees they had hugged in the course of therapy. He drew a scatter diagram to represent this data.
Describe the correlation shown by the scatter diagram, and interpret the correlation in context.
Did this page help you?
The table below shows data from the United States regarding annual per capita cheese consumption (in pounds) and the divorce rate (number of divorces per 1000 people) for ten years between 2000 and 2018:
Year | 2000 | 2002 | 2004 | 2006 | 2008 | 2010 | 2012 | 2014 | 2016 | 2018 |
Cheese consumption (pounds) | 32.1 | 32.8 | 33.6 | 34.8 | 34.5 | 35 | 35.5 | 36.2 | 38.5 | 40 |
Divorce rate (number per 1000 people) | 4 | 3.9 | 3.7 | 3.7 | 3.5 | 3.6 | 3.4 | 3.2 | 3.0 | 2.9 |
Draw a scatter diagram to represent this data, with per capita cheese consumption on the horizontal axis and divorce rate on the vertical axis.
Did this page help you?
Myfanwy has been applying different voltages (, measured in volts) to an electrical circuit in her lab and recording the resulting currents (, measured in amps). The smallest voltage she applied was 0.5 volts, and the largest voltage she applied was 120 volts.
She found the equation of the regression line of on v to be = 0.056+0.332.
Explain why it would not be sensible to use the regression equation to work out:
Myfanwy’s lab partner suggests that the value 0.056 in the regression equation represents the current in the circuit when the voltage applied is zero. Explain why he might suggest this, but also suggest a reason why his interpretation is most likely incorrect.
Did this page help you?
The following table shows the height, h cm, and weight, w kg, for each of eleven students at a sixth form college.
h | 167 | 182 | 176 | 173 | 17 | 174 | 177 | 178 | 172 | 170 | 169 |
w | 51 | 62 | 69 | 65 | 65 | 56 | 64 | 62 | 51 | 55 | 58 |
The following statistics were calculated for the data on height:
mean=159.5 cm, standard deviation=45.3 cm
An outlier is an observation which lies more than ±2 standard deviations from the mean.
With the outlier data excluded, the equation of the regression line of w on h is w = 87.6 + 0.845h.
Based on your diagram, along with the regression equation, to what extent would you say that a person’s height may be used as an accurate predictor of his or her weight?
Did this page help you?
The table below shows the mass, (kg), and the CO2 emissions, (g/km), for a sample of 12 Ford cars registered in 2016, from the large data set.
1833 | 1144 | 1327 | 1399 | 989 | 1555 | 1806 | 1497 | 1730 | 2030 | 1211 | 1088 | |
134 | 138 | 119 | 98 | 115 | 119 | 129 | 159 | 225 | 152 | 138 | 122 |
The equation of the regression line of on is .
Give an interpretation of the value of the gradient of the regression line.
Use your knowledge of the large data set to explain whether there is likely to be a causal relationship between the mass of a car and its CO2 emissions.
Explain why it would not be reliable to use this regression equation to predict:
The median and quartiles for the emissions data are:
An outlier is defined as a value which lies either 1.5 the interquartile range above the upper quartile or 1.5 the interquartile range below the lower quartile.
Using your knowledge of the large data set, suggest two other factors about cars that should also be considered if creating a model to predict a car's CO2 emissions.
Did this page help you?
Ella measures how the extension, x mm, of a thin piece of metal wire varies with the force applied to it, F kN. She records her results in the table below.
F | 15 | 32 | 49 | 76 | 99 | 106 | 112 | 124 | 132 |
x | 0.2 | 0.4 | 0.6 | 0.9 | 1.4 | 1.5 | 1.6 | 1.8 | 1.8 |
Ella calculates the regression line of F on x to be F = 0.004 69.3 x.
Explain why this equation must be wrong.
The correct equation for the regression line of F on x is F = 6.16 + 67.6x.
Interpret the value of 67.6 in this context.
Using the correct regression line, Ella estimates that if she applies a force of 1000 kN then the wire will show an extension of 14.7 mm.
Give two reasons why Ella’s estimate may not be accurate.
Did this page help you?
The table below shows a comparison of the average house price, H (£100 000), and the average yearly income, I (£10 000), for different areas around the UK in 2021.
Area | H | I |
Conwy | 155.1 | 26.4 |
Perth and Kinross | 181.3 | 27.9 |
Richmondshire | 190.3 | 25.1 |
Monmouthshire | 232.6 | 31.4 |
Trafford | 260.2 | 32.0 |
Gwynedd | 148.5 | 23.6 |
Basingstoke and Dean | 297.7 | 33.7 |
Daventry | 259.2 | 29.5 |
The equation of the regression line of on is calculated to be .
A particularly unscrupulous politician uses this to claim that if you want a salary of , all you need to do is buy a house that costs .
Comment on the validity of the politician's claim.
Did this page help you?
Two researchers, Alwyn and Beth, are working on a project collecting data about the self-reported happiness of students on a scale from 0 to 10, , and the number of exams sat by those students, . After collecting data from 1000 students, they construct a scatter diagram and find the equation of the regression line of on n to be .
Explain what correlation the data is likely to show in the scatter diagram.
What information about the original data set would need to be checked before using the regression line equation to estimate the self-reported happiness of a student sitting 8 exams?
After calculating the equation of the line of regression, Alwyn accidentally deletes all the data collected about the self-reported happiness scores. Alwyn says it's not a problem since he can use the regression line and the number of exams sat to recalculate all the values. Beth says that Alwyn is wrong and the original data is lost forever.
Explain which researcher is correct.
Did this page help you?
A consultant is trying to improve the efficiency of how a factory making chewing gum operates. To help them do this, they collect many types of data about the factory workers. One such type of data is the number of chewing gum packets made per shift. The list below shows the number of chewing gum packets made by a particular worker (Worker 1) during the last 10 shifts worked.
392 414 536 474 212 396 427 545 459 234
Calculate the mean number of chewing gum packets made per shift by Worker 1 to the nearest whole number of packets.
The table below shows the mean number of chewing gum packets, , made by various workers along with how many hours of training, hours, they have received.
Worker |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
|
512 |
499 |
359 |
393 |
432 |
456 |
520 |
475 |
|
18 |
24 |
22.5 |
15 |
16 |
20 |
21 |
22 |
21 |
The consultant then goes on to collect even more data on other factory workers and records some of it in the table below.
Worker |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
600 |
598 |
584 |
602 |
593 |
585 |
591 |
601 |
605 |
|
29 |
28.5 |
32 |
29 |
34.5 |
30.5 |
37 |
31 |
30 |
Without adding this new data to your scatter diagram, what advice could the consultant give to the factory to improve the efficiency of their workers?
Did this page help you?
The table below shows data from the large data set on the engine size, S cm3, and the mass of the vehicle, M kg, for a random sample of 10 cars that were first registered in 2016.
M | 925 | 1225 | 1141 | 1350 | 1425 | 1280 | 1613 | 1505 | 1820 | 1816 |
S | 998 | 1248 | 1398 | 1399 | 1499 | 1598 | 1956 | 1984 | 1995 | 1997 |
The equation for the regression line of S on M is S
The table below shows S and M for a random sample of 3 other cars first registered in 2016.
M | 1095 | 1485 | 1232 |
S | 1242 | 1798 | 999 |
Considering this second sample, use the regression line equation and the values of M to predict values of S and find the average percentage difference of these estimated values of S from the true values of S. Hence, comment on how accurately this regression line equation can predict values.
Using your knowledge of the large data set, explain whether there is likely to be a causal relationship between S and M.
A researcher claims that this correlation between S and M is a coincidence. How could you use data from the large data set to check this claim?
Did this page help you?
Four statisticians are arguing over which line best highlights the trend of the set of data shown in the scatter diagram below.
The first statistician draws, by eye, a line of best fit and claims its equation is . The second draws, again by eye, a different line of best fit and claims its equation is . The third calculates the equation of the regression line of on claims it is . The fourth statistician claims that all three of the other statisticians are definitely wrong and that there is no line of best fit.
By adding each of these lines to the scatter diagram, comment on the claims of each of the statisticians.
Did this page help you?
Paige takes a sample of 9 cities throughout the UK to compare the percentage of people living in a city who identify as vegan, %, and the percentage of restaurants offering vegan options in that same city, %.
The regression line of on is calculated, and it is used to predict values of for and , the values returned are and respectively.
Find the equation of the regression line of on .
In one of the cities, 1.16% of people were vegan and 55.9% of restaurants offered vegan options.
Use the equation of the regression line of on to estimate the percentage of restaurants offering vegan options in a city in which 1.16% of people are vegan. Give your estimated value of to 3 significant figures. Compare this to the information above.
Paige discovers that in one city every restaurant offers vegan options. Paige suggests that the equation of the regression line of on can be used to find the percentage of people in this city who identify as vegan. Explain why Paige is likely wrong.
Did this page help you?
A ride sharing app collected data on the time, t minutes, taken to complete a journey of distance, d miles. Data from a random sample of 8 journeys is detailed in the table below.
d | 3.9 | 6.6 | 8.5 | 1.3 | 1.7 | 3.7 | 7.4 | 6.1 |
t | 25 | 36 | 39 | 6 | 8 | 19 | 38 | 32 |
By plotting a scatter diagram of t on d for this data, explain whether or not it is appropriate to use a linear regression model on this data.
Using a new random sample of thousands of journeys, the ride sharing app calculated the regression line of time on distance to be t = 1.8 + 5.9d.
The app uses this regression equation to predict that a journey of distance 7 km would take 39.5 minutes. Explain why this is incorrect.
The regression equation predicts that for journeys less than 0.3 miles the time taken will be less than zero minutes. What is the most likely reason that the regression equation gives this false prediction?
Did this page help you?
A maths teacher randomly selects 10 students from a class of 30 to answer a survey. The survey asks students how many practice questions they completed when revising for a recent test, Q, and their percentage score in that test, S %. Summary statistics for Q are shown below
=21 Range of Q=20
The equation of the regression line of S on Q is S = 34 + 2Q.
Explain which variable is the response variable.
Use the regression equation to find an estimate for the mean value and range of S. State any assumptions that are needed.
Comment on the reliability of using the regression equation to:
Did this page help you?
An owner of a beach resort is comparing parasol sales, £p, and sun cream sales, £s, at the resort over a period of eleven days. The data is standardised by coding the variables using x = and y = . The values for the first ten days are plotted on the scatter diagram below.
Did this page help you?
An environmentalist is using the large data set to see if there is a correlation between the CO emissions, g/km, and the NOX emissions, g/km, of cars first registered in 2016. To do this, the environmentalist takes 5 different samples, each containing 6 cars, and calculates the mean values of and for each sample. The data is shown in the table below.
0.20 | 0.28 | 0.14 | 0.52 | 0.23 | |
0.03 | 0.06 | 0.01 | 0.02 | 0.05 |
The environmentalist now wishes to find an estimate for the total NOX emissions produced by a group of 6 cars that that have a mean value of CO emissions of 0.17 g/km after each car has driven 20 km.
Did this page help you?