Outliers & Resistant Measures (College Board AP® Statistics)

Revision Note

Naomi C

Author

Naomi C

Expertise

Maths

Outliers

What are outliers?

  • Outliers are extreme data values that do not fit with the rest of the data

    • They are either a lot bigger or a lot smaller than the rest of the data

  • There are two primary methods for defining outliers in this course

    1. Outliers are values that are more than 1.5 times the interquartile range (IQR) from the nearest quartile

      • x is an outlier if x less than straight Q 1 minus 1.5 times IQR or x greater than straight Q 3 plus 1.5 times IQR

    2. Outliers are values that lie two or more standard deviations above or below the mean

      • x is an outlier if x greater or equal than mu plus 2 sigma or x less or equal than mu minus 2 sigma

  • Outliers can have a big effect on some statistical measures

Exam Tip

These two methods may result in slightly different boundaries for determining whether or not a particular value is an outlier. As long as you show full working and reasoning, your answer will gain full marks.

Should I remove outliers?

  • The decision to remove outliers will depend on the context

  • Outliers should be removed if they are found to be errors

    • The data may have been recorded incorrectly

      • e.g. the age of a teenager, 17, may have been recorded as 71 by mistake

  • Outliers should not be removed if they are a valid part of the sample

    • The data may need to be checked to verify that it is not an error

    • e.g. the annual salaries of employees of a business might appear to have an outlier, but this could be the director’s salary

Worked Example

The ages, in years, of a number of children attending a birthday party are given below.

2,   7,   5,   4,   8,   4,   6,   5,   5,   15,   2,   5,   13

Identify any outliers within the data set.

Answer:

Method 1: IQR

x is an outlier if x less than straight Q 1 minus 1.5 times IQR or x greater than straight Q 3 plus 1.5 times IQR

Find the first quartile and the third quartile, this can be done by hand or by entering the data into your calculator and looking at the one-variable statistics

straight Q 1 equals 4
straight Q 3 equals 7.5

Calculate the IQR

table row IQR equals cell straight Q 3 minus straight Q 1 end cell row blank equals cell 7.5 minus 4 end cell row blank equals cell 3.5 end cell end table

Find the boundaries for any possible outliers

straight Q 1 minus 1.5 times IQR equals 4 minus 1.5 times 3.5 equals negative 1.25
straight Q 3 plus 1.5 times IQR equals 7.5 plus 1.5 times 3.5 equals 12.75

Identify any values in the data set outside of these boundaries

The ages of 13 and 15 are outliers as they lie more than 1.5 times the interquartile range above the third quartile

Method 2: Standard deviations

x is an outlier if x greater or equal than mu plus 2 sigma or x less or equal than mu minus 2 sigma

Find the mean and the standard deviation by entering the data into your calculator and looking at the one-variable statistics

Remember that this data is the entire data set so you want to use the population standard deviation

table row mu equals cell 6.23076... end cell row sigma equals cell 3.70350... end cell end table

Find the boundaries for any possible outliers

mu plus 2 sigma equals 6.23076... plus 2 times 3.70350... equals 13.63776...
mu minus 2 sigma equals 6.23076... negative 2 times 3.70350... equals negative 1.17624...

Identify any values in the data set outside of these boundaries

The age of 15 lies more than 2 standard deviations above the mean so it is an outlier

Resistant measures

What is a resistant measure?

  • A resistant measure is a statistical measure that is not greatly affected by an outlier

    • It is sometimes not affected by an outlier at all

    • Resistant measures are sometimes known as robust measures

  • The median and the interquartile range (IQR) are considered to be resistant measures

  • The mean, standard deviation , and range are considered to be nonresistant measures

    • The value of any of these measures could be significantly affected by an outlier

You've read 0 of your 10 free revision notes

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Naomi C

Author: Naomi C

Naomi graduated from Durham University in 2007 with a Masters degree in Civil Engineering. She has taught Mathematics in the UK, Malaysia and Switzerland covering GCSE, IGCSE, A-Level and IB. She particularly enjoys applying Mathematics to real life and endeavours to bring creativity to the content she creates.