Correlation Coefficients (DP IB Applications & Interpretation (AI)): Revision Note
Did this video help you?
Linear Regression
What is linear regression?
If strong linear correlation exists on a scatter diagram then the data can be modelled by a linear model
Drawing lines of best fit by eye is not the best method as it can be difficult to judge the best position for the line
The least squares regression line is the line of best fit that minimises the sum of the squares of the gap between the line and each data value
It can be calculated by either looking at:
vertical distances between the line and the data values
This is the regression line of y on x
horizontal distances between the line and the data values
This is the regression line of x on y
How do I find the regression line of y on x?
The regression line of y on x is written in the form
a is the gradient of the line
It represents the change in y for each individual unit change in x
If a is positive this means y increases by a for a unit increase in x
If a is negative this means y decreases by |a| for a unit increase in x
b is the y – intercept
It shows the value of y when x is zero
You are expected to use your GDC to find the equation of the regression line
Enter the bivariate data and choose the model “ax + b”
Remember the mean point
will lie on the regression line
How do I find the regression line of x on y?
The regression line of x on y is written in the form
c is the gradient of the line
It represents the change in x for each individual unit change in y
If c is positive this means x increases by c for a unit increase in y
If c is negative this means x decreases by |c| for a unit increase in y
d is the x – intercept
It shows the value of x when y is zero
You are expected to use your GDC to find the equation of the regression line
It is found the same way as the regression line of y on x but with the two data sets switched around
Remember the mean point
will lie on the regression line
How do I use a regression line?
The regression line can be used to decide what type of correlation there is if there is no scatter diagram
If the gradient is positive then the data set has positive correlation
If the gradient is negative then the data set has negative correlation
The regression line can also be used to predict the value of a dependent variable from an independent variable
The equation for the y on x line should only be used to make predictions for y
Using a y on x line to predict x is not always reliable
The equation for the x on y line should only be used to make predictions for x
Using an x on y line to predict y is not always reliable
Making a prediction within the range of the given data is called interpolation
This is usually reliable
The stronger the correlation the more reliable the prediction
Making a prediction outside of the range of the given data is called extrapolation
This is much less reliable
The prediction will be more reliable if the number of data values in the original sample set is bigger
The y on x and x on y regression lines intersect at the mean point
Examiner Tips and Tricks
Once you calculate the values of a and b store then in your GDC
This means you can use the full display values rather than the rounded values when using the linear regression equation to predict values
This avoids rounding errors
Worked Example
The table below shows the scores of eight students for a maths test and an English test.
Maths ( | 7 | 18 | 37 | 52 | 61 | 68 | 75 | 82 |
English ( | 5 | 3 | 9 | 12 | 17 | 41 | 49 | 97 |
a) Write down the value of Pearson’s product-moment correlation coefficient, .

b) Write down the equation of the regression line of on
, giving your answer in the form
where
and
are constants to be found.

c) Write down the equation of the regression line of on
, giving your answer in the form
where
and
are constants to be found.

d) Use the appropriate regression line to predict the score on the maths test of a student who got a score of 63 on the English test.

Did this video help you?
PMCC
What is Pearson’s product-moment correlation coefficient?
Pearson’s product-moment correlation coefficient (PMCC) is a way of giving a numerical value to a linear relationship of bivariate data
The PMCC of a sample is denoted by the letter
r can take any value such that
A positive value of r describes positive correlation
A negative value of r describes negative correlation
r = 0 means there is no linear correlation
r = 1 means perfect positive linear correlation
r = -1 means perfect negative linear correlation
The closer to 1 or -1 the stronger the correlation

How do I calculate Pearson’s product-moment correlation coefficient (PMCC)?
You will be expected to use the statistics mode on your GDC to calculate the PMCC
The formula can be useful to deepen your understanding
is linked to the covariance
and
are linked to the variances
You do not need to learn this as using your GDC will be expected
When does the PMCC suggest there is a linear relationship?
Critical values of r indicate when the PMCC would suggest there is a linear relationship
In your exam you will be given critical values where appropriate
Critical values will depend on the size of the sample
If the absolute value of the PMCC is bigger than the critical value then this suggests a linear model is appropriate
You've read 0 of your 5 free revision notes this week
Sign up now. It’s free!
Did this page help you?