Linear Regression (DP IB Applications & Interpretation (AI)): Revision Note
Did this video help you?
Linear Regression
What is linear regression?
If strong linear correlation exists on a scatter diagram then the data can be modelled by a linear model
Drawing lines of best fit by eye is not the best method as it can be difficult to judge the best position for the line
The least squares regression line is the line of best fit that minimises the sum of the squares of the gap between the line and each data value
This is usually called the regression line of y on x
It can be calculated by looking at the vertical distances between the line and the data values
The regression line of y on x is written in the form
a is the gradient of the line
It represents the change in y for each individual unit change in x
If a is positive this means y increases by a for a unit increase in x
If a is negative this means y decreases by |a| for a unit increase in x
b is the y – intercept
It shows the value of y when x is zero
You are expected to use your GDC to find the equation of the regression line
Enter the bivariate data and choose the model “ax + b”
Remember the mean point
will lie on the regression line
How do I use a regression line?
The equation of the regression line can be used to decide what type of correlation there is if there is no scatter diagram
If a is positive then the data set has positive correlation
If a is negative then the data set has negative correlation
The equation of the regression line can also be used to predict the value of a dependent variable (y) from an independent variable (x)
The equation should only be used to make predictions for y
Using a y on x line to predict x is not always reliable
Making a prediction within the range of the given data is called interpolation
This is usually reliable
The stronger the correlation the more reliable the prediction
Making a prediction outside of the range of the given data is called extrapolation
This is much less reliable
The prediction will be more reliable if the number of data values in the original sample set is bigger
Examiner Tips and Tricks
Once you calculate the values of a and b store then in your GDC
This means you can use the full display values rather than the rounded values when using the linear regression equation to predict values
This avoids rounding errors
Worked Example
Barry is a music teacher. For 7 students, he records the time they spend practising per week ( hours) and their score in a test (
%).
Time ( | 2 | 5 | 6 | 7 | 10 | 11 | 12 |
Score ( | 11 | 49 | 55 | 75 | 63 | 68 | 82 |
a) Write down the equation of the regression line of on
, giving your answer in the form
where
and
are constants to be found.

b) Give an interpretation of the value of .

c) Another of Barry’s students practises for 15 hours a week, estimate their score. Comment on the validity of this prediction.

You've read 0 of your 5 free revision notes this week
Sign up now. It’s free!
Did this page help you?