Outliers, High-Leverage & Influential Points (College Board AP® Statistics)

Revision Note

Mark Curtis

Expertise

Maths

Outliers in a regression model

What is an outlier in a regression model?

  • An outlier in a regression model is a point that has an extreme y-value relative to the least-squares regression line

    • It has a very large (positive or negative) residual

      • It does not follow the general linear trend shown by the rest of the data

A scatter plot with data points, a dashed regression line, and one point (an outlier) circled with an arrow pointing upwards from the regression to the circled point.
An outlier with a large positive residual

High-leverage points

What is a high-leverage point?

  • A high-leverage point is a data point that has an extreme x-value relative to the other data points

  • A high-leverage point may still follow the general linear trend shown by the data

A scatterplot showing a high-leverage point following the regression line with a very large x-value.
A high-leverage point with a large x-value

Can a high-leverage point also be an outlier?

  • A high-leverage point can also be an outlier if it has both:

    • an extreme x-value relative to the other data points

    • an extreme y-value (large residual) relative to the regression line

A scatterplot showing a point that is both an outlier and a high-leverage point because it has a large x-value and a large negative residual
A point that is both a high-leverage point and an outlier
  • Note that a high-leverage point that does follow the general linear trend is not considered an outlier

    • This is because, even though it has an extreme x-value, it still lies close to the regression line

      • so does not have a large residual

Influential points

What is an influential point?

  • An influential point in a regression model is a point that, if removed, changes the linear relationship significantly

    • Removing it could cause a significant change in:

      • the correlation coefficient

      • the slope of a regression line

      • the y-intercept of a regression line

Can outliers and high-leverage points be influential points?

  • An outlier that is also a high-leverage point is likely to be an influential point

    • it has an extreme x-value relative to the other data points

    • it has an extreme y-value (large residual) relative to the regression line

      • Including this point could significantly change the slope of the regression line, for example

  • Outliers or high-leverage points alone may or may not be influential points

    • They all affect the linear relationship

      • but not necessarily significantly

Three scatterplots showing the influence of outliers and/or high-leverage points on regression lines. Left and middle plots have non-influential points, while the right plot has an influential point.
The solid regression line changes to the dotted one when the circled point is included

You've read 0 of your 10 free revision notes

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Mark Curtis

Author: Mark Curtis

Mark graduated twice from the University of Oxford: once in 2009 with a First in Mathematics, then again in 2013 with a PhD (DPhil) in Mathematics. He has had nine successful years as a secondary school teacher, specialising in A-Level Further Maths and running extension classes for Oxbridge Maths applicants. Alongside his teaching, he has written five internal textbooks, introduced new spiralling school curriculums and trained other Maths teachers through outreach programmes.