Linearization of Bivariate Data (College Board AP® Statistics)

Revision Note

Mark Curtis

Expertise

Maths

Linearization of bivariate data

What does transforming a variable mean?

  • Transforming a variable means performing a mathematical operation on either the x-coordinates of the data points, or the y-coordinates

    • e.g. take the x-coordinates and square them

      • x becomes x squared

  • A common transformation is taking the natural logarithm of the y-coordinates

    • y becomes ln space y

What is linearization of bivariate data?

  • If a scatterplot shows that data points do not follow a linear relationship

    • then it is sometimes possible to transform one of the variables to make the data points follow a more linear relationship

      • This process is called linearization of bivariate data

Exam Tip

When transforming variables, the type of transformation will be given to you in the exam.

How do I know if the transformed data is more linear than the untransformed data?

  • There are two different methods to check if the transformed data is more linear than the untransformed data:

    • Method 1: Create residual plots before and after the transformation

      • If, after the transformation, the plots are more random (no longer following curves or patterns), then this is evidence that the transformed data is more linear than the untransformed data

    • Method 2: Calculate the coefficient of determination, r squared, before and after the transformation

      • If, after the transformation, r squared is closer to 1, then this is evidence that the least-squares regression line is a better model for the transformed data than the regression line for the untransformed data

How do I use the regression equation for the transformed data?

  • Find the least-squares regression line for the transformed data

    • This will either have the form

      • open parentheses transformed space y with hat on top close parentheses equals a plus b x

    • or the form

      • y with hat on top equals a plus b open parentheses transformed space x close parentheses

  • Then use this equation to predict y values, given x-values

    • You may need to rearrange the equation to make y with hat on top the subject

    • or you may need to transform the x-value before substituting it in

Worked Example

The scatterplot below shows the population of mosquitoes, y, in different parts of an island against the percentage cover of vegetation, x%. The least-squares regression line and its residual plot are also shown.

Two graphs: Left graph shows data points and a regression line; right graph shows residuals of these data points, forming a curved pattern.

A biologist claims that the natural logarithm of the population of mosquitoes will have a linear relationship with the percentage cover of vegetation. The scatterplot, least-squares regression line and residual plot for the transformed data are shown below.

Left plot with ln(y) vs. x has a regression line and data points near it; right plot shows residuals vs. x scattered around zero.

(a) State, with justification, whether or not the new plots support the biologist's claim.

It is not enough to say the scatterplot looks more linear

Instead, you need to compare the residual plots

They are more random after the transformation, suggesting that the transformed data is more linear than the untransformed data

Remember to give all comments in context (copy phrases from the question to help)

Answer:

The residual plot from the scatterplot showing the population of mosquitoes, y, in different parts of an island against the percentage cover of vegetation, x%, shows that the residuals follow a U-shaped pattern (they are not random)

The residual plot from the scatterplot showing the natural logarithm of the population of mosquitoes, ln space y, in different parts of an island against the percentage cover of vegetation, x%, shows that these residuals are randomly spread (not following a pattern)

This means there is evidence to say that the natural logarithm of the population of mosquitoes, ln space y, in different parts of an island against the percentage cover of vegetation, x%, has a more linear relationship than the population of mosquitoes, y, in different parts of an island against the percentage cover of vegetation, x%

This supports the claim by the biologist

(b) Given that the second regression line has a slope of 0.102 and an axis intercept of 4.29, estimate, to the nearest thousand, the population of mosquitoes in an area on the island with a vegetation cover of 65%.

Answer:

Write out the equation of the least-squares regression line using ln space y instead of y (the x is unchanged)

ln space y with hat on top equals 4.29 plus 0.102 x

Substitute in x equals 65 and simplify

table row cell ln space y with hat on top end cell equals cell 4.29 plus 0.102 cross times 65 end cell row cell ln space y with hat on top end cell equals cell 10.92 end cell end table

Rearrange the equation to make y with hat on top the subject (find straight e to the power of the right-hand side)

y with hat on top equals straight e to the power of 10.92 end exponent equals 55270.79...

Round this answer to the nearest 1000 and give the answer in context

The population of mosquitoes is approximately 55000 in an area on the island with a vegetation cover of 65%

You've read 0 of your 10 free revision notes

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Mark Curtis

Author: Mark Curtis

Mark graduated twice from the University of Oxford: once in 2009 with a First in Mathematics, then again in 2013 with a PhD (DPhil) in Mathematics. He has had nine successful years as a secondary school teacher, specialising in A-Level Further Maths and running extension classes for Oxbridge Maths applicants. Alongside his teaching, he has written five internal textbooks, introduced new spiralling school curriculums and trained other Maths teachers through outreach programmes.