Outliers, High-Leverage & Influential Points (College Board AP® Statistics): Study Guide
Outliers in a regression model
What is an outlier in a regression model?
An outlier in a regression model is a point that has an extreme
-value relative to the least-squares regression line
It has a very large (positive or negative) residual
It does not follow the general linear trend shown by the rest of the data

High-leverage points
What is a high-leverage point?
A high-leverage point is a data point that has an extreme
-value relative to the other data points
A high-leverage point may still follow the general linear trend shown by the data

Can a high-leverage point also be an outlier?
A high-leverage point can also be an outlier if it has both:
an extreme
-value relative to the other data points
an extreme
-value (large residual) relative to the regression line

Note that a high-leverage point that does follow the general linear trend is not considered an outlier
This is because, even though it has an extreme
-value, it still lies close to the regression line
so does not have a large residual
Influential points
What is an influential point?
An influential point in a regression model is a point that, if removed, changes the linear relationship significantly
Removing it could cause a significant change in:
the correlation coefficient
the slope of a regression line
the
-intercept of a regression line
Can outliers and high-leverage points be influential points?
An outlier that is also a high-leverage point is likely to be an influential point
it has an extreme
-value relative to the other data points
it has an extreme
-value (large residual) relative to the regression line
Including this point could significantly change the slope of the regression line, for example
Outliers or high-leverage points alone may or may not be influential points
They all affect the linear relationship
but not necessarily significantly

Sign up now. It’s free!
Did this page help you?