Lines of Best Fit & Regression Lines (Edexcel GCSE Statistics)
Revision Note
Written by: Roger B
Reviewed by: Dan Finlay
Line of Best Fit Basics
What is a line of best fit?
If a scatter graph suggests that there is a positive or negative correlation
a line of best fit can be drawn on the scatter graph
This can then be used to make predictions
How do I draw a line of best fit?
A line of best fit can often be drawn by eye
It is a straight line (use a ruler!)
It must extend across the full data set
There should be roughly as many points on either side of the line (along its whole length)
The spaces between the points and the line should roughly be the same on either side
If there is one extreme value (outlier) that does not fit the general pattern
then ignore this point when drawing a line of best fit
What is the double mean point?
A question may talk about the double mean point
This is the point
is the mean of the data values that are plotted along the x-axis
is the mean of the data values that are plotted along the y-axis
The question may give you the values of and
Or you may need to calculate the means from the data
If a question mentions the double mean point, then the line of best fit must go through the double mean point
It should still follow all the other rules for drawing a line of best fit (roughly same number of points on each side, etc.)
If a question doesn't mention the double mean point
then you don't need to calculate it or worry about drawing the line through it
How do I use a line of best fit?
The line of best fit can be used to predict the value of one variable from the other variable
See the Worked Example
Predictions should only be made for values that are within the range of the given data
Making a prediction within the range of the given data is called interpolation
This will normally give a reliable result
Making a prediction outside of the range of the given data is called extrapolation
This is much less reliable
What about the gradient and y-intercept of a line of best fit?
You need to be able to interpret the meaning of the gradient and y-intercept of a line of best fit
The gradient is the slope or 'steepness' of the line
A question may tell you the gradient of the line of best fit
If you need to find it you can calculate it using 'rise over run'
Pick two points on the line with coordinates and
Be careful – the plotted data points will usually not be points on the line!
The gradient of the line of best fit tells you the rate of change of the y-axis variable with respect to the x-axis variable
This needs to be interpreted in context
For example if the x-axis variable is distance travelled in a taxi (in miles) and the y-axis variable is the cost of the taxi ride (in pounds £)
then the gradient of the line of best fit (£ per mile) is the cost in pounds for increasing the distance travelled by 1 mile
The y-intercept is the value of the y-coordinate at the point where the line crosses the y-axis
This can be read off the graph
The y-intercept of the line of best fit tells you the value of the y-axis variable when the x-axis variable is equal to zero
This needs to be interpreted in context
For example if the x-axis variable is distance travelled in a taxi (in miles) and the y-axis variable is the cost of the taxi ride (in pounds £)
then the y-intercept of the line of best fit tells you the 'flat fee' that is added onto every taxi ride
Examiner Tips and Tricks
Sliding a ruler around a scatter graph can help to find the right position for the line of best fit!
Remember to draw the line through the double mean point if the question mentions it
Worked Example
Sophie wants to know if the price of a computer is related to the speed of the computer.
She tests 8 computers by running the same program on each, measuring how many seconds it takes to finish.
Sophie's results are shown in the table below.
Price (£) | 320 | 300 | 400 | 650 | 220 | 380 | 900 | 700 |
Time (secs) | 3.2 | 5.3 | 4.1 | 2.9 | 5.1 | 4.3 | 2.6 | 3.8 |
(a) Draw a scatter diagram showing these results.
Plot each point carefully using crosses
(b) Write down the type of correlation shown and interpret this in the context of the question.
The shape formed by the points goes from top left to bottom right (negative gradient), so there is negative correlation
As one quantity increases (price), the other decreases (time)
Note that time decreasing means that the computer is running faster
The graph shows a negative correlation
This means that the more a computer costs, the quicker it is at running the program
(c) Use a line of best fit to estimate the price of a computer that completes the task in 3.4 seconds.
First draw a line of best fit, by eye
Then draw a horizontal line from 3.4 seconds to the line of best fit
Draw a vertical line down to read off the price
A computer that takes 3.4 seconds to run the program should cost around £620
A range of different answers would be accepted, depending on the line of best fit
(d) Explain why this should not be used to estimate the time taken to complete the task by a computer costing £1500.
£1500 is outside the range of the data, so estimating that from the scatter diagram would be extrapolation
Using the diagram for a computer costing £1500 would be extrapolation, and results from extrapolation are usually unreliable
Regression Lines
What is a regression line?
Statistical software can calculate the equation for an 'ideal' line of best fit
This 'ideal' line of best fit is known as a regression line
It is more accurate than a line of best fit drawn by eye
You do not need to calculate the equation for a regression line
It will be given to you in the question
You need to be able to use and interpret it
The equation of a regression line will be given in the following form
is the y-intercept of the regression line
is the gradient of the regression line
Both of those have the same meaning that they do for any line of best fit
You may be asked to draw a regression line onto a scatter diagram
You need to know two points on the line
Choose two values (they don't need to correspond to any data values!)
Substitute into the equation of the regression line to find the corresponding values
Plot those two points on the scatter diagram and draw a straight line through them
Use a ruler!
A regression line drawn from its equation will always go through the double mean point for the data set
You may be required to use this fact in an exam question
Examiner Tips and Tricks
Be careful with the form of the regression line
It is slightly different from the version of a straight line equation that you might be familiar with
Remember that the regression line always goes through the double mean point
Worked Example
Rebecca, a regular jogger, recorded the number of calories she was able to burn ( calories) by running different distances ( km). This data is shown on the scatter diagram below.
The equation of the regression line for the data in the scatter diagram is
(a) Interpret the number 62.2 in the equation of the regression line in the context of the question.
62.2 is the gradient of the regression line
It tells how much the y-variable changes when the x-variable goes up by 1
It means that for every extra kilometre she runs, she burns 62.2 more calories
(b) Draw the regression line on the scatter diagram.
Find the coordinates of two points on the line and draw the line through these points
So draw the line through the points (0, 18.8) and (10, 640.8)
The mean of the data values for the distance run is 8 km.
(c) Use this information to find the mean of the data values for the calories burned.
Use the fact that the regression line always goes through the double mean point
Draw a vertical line up from 8 on the x-axis until it hits the regression line
Then draw a horizontal line from there until it hits the y-axis
Read the value off the y-axis (it's a little bit less than 520)
516 calories
Marks would be awarded for a range of answers around that value
Last updated:
You've read 0 of your 5 free revision notes this week
Sign up now. It’s free!
Did this page help you?