Outliers (Edexcel GCSE Statistics)
Revision Note
Written by: Roger B
Reviewed by: Dan Finlay
Outliers
What are outliers?
Outliers are extreme data values that do not fit with the general pattern of the data
Outliers in a data set can be due to
genuine extreme events
these are valid data, even if unusual
mistakes in the data collection
these should be identified and removed if possible
Outliers will affect some statistics that are calculated from the data
They can have a big effect on the mean,
but not on the median
and usually not on the mode
The range will be completely changed by a single outlier
but the interquartile range will not be affected
When calculating the mean or the range it is important to decide whether any outlier(s) should be included in the calculations
An exam question will tell you whether to include outliers or not
But you may have to decide which value(s) are outliers
Look for values that are much bigger or smaller than the rest of the data set
In general outliers are
included if they are a valid piece of data
excluded if it is likely that they are erroneous
Worked Example
The following data was collected about the ages of a number of students at the time that they sat their GCSE Maths exam
3 13 15 15 15 15 16 16 16 16 16 57
(a) Suggest possible outliers in the data set.
Most students sit their GCSEs when they are 15 or 16
Some students sit them a bit younger, so the '13' is not very unusual
However the '3' and the '57' are definitely extreme data values compared to the rest of the set!
3 and 57 should probably be considered to be outliers
(b) For each outlier identified in part (a), suggest with a reason whether the data value should be kept in or excluded from the data set.
It is essentially impossible that a 3 year old would be sitting a GCSE exam, so that data value is surely a mistake
On the other hand older people do sometimes sit GCSE exams, so the '57' shouldn't be excluded from the data set without further information
The '3' should be excluded. There is no way a 3 year old would be sitting a GCSE exam, so that is almost certainly an error in the data collection.
The '57' should be kept. It is unusual for older people to sit GCSEs, but it is not impossible. So that may be a valid data value.
Last updated:
You've read 0 of your 5 free revision notes this week
Sign up now. It’s free!
Did this page help you?