Measures of Variability (College Board AP® Statistics)
Study Guide
Range
What is the range of a data set?
The range of a data set is the difference between the largest and smallest values in the data set
If the data has units (seconds, cm, etc.), then the range has the same units as the values in the data set
The range is a measure of variability (i.e. a measure of spread)
Recall that an average (measure of center) for a data set tells you what a 'typical' data value is
The range tells you how spread out the data is around that average
A small range means all the data values are close to the average
A large range means that some of the data values are far from the average
The range is affected by outliers (extreme values, i.e. extremely large or extremely small)
Outliers can cause the range of a data set to be large
The range would then give a misleading idea about how spread out most of the data really is
Worked Example
Saffy counted the number of pairs of shoes that the students in her class owned. The results are listed below:
2, 6, 3, 3, 15, 4, 6, 7, 5, 4, 5,
5, 8, 4, 6, 6, 2, 7, 8, 5, 3
What is the range of the data set?
It can sometimes help to write the data in size order
2, 2, 3, 3, 3, 4, 4, 4, 5, 5,
5, 5, 6, 6, 6, 6, 7, 7, 8, 8, 15
The range is the largest value minus the smallest value
15 - 2 = 13
The range of the number of pairs of shoes that a student in Saffy's class owns is 13
Interquartile range
What is the interquartile range (IQR) of a data set?
The interquartile range (IQR) of a data set is the difference between the third quartile (Q3) and first quartile (Q1) of the data set
The IQR has the same units as the values in the data set (seconds, cm, etc.)
The interquartile range is also a measure of variability (i.e. of spread)
Half of the data values in a data set are between Q1 and Q3
This 'middle half' of the data set may be thought of as the 'most typical' half of the data
The IQR tells you how spread out the values in that middle half are
The largest and smallest values in a data set do not affect the interquartile range
This makes the IQR a better measure of spread for data sets with outliers (extreme values, i.e. extremely large or extremely small)
Examiner Tips and Tricks
When you enter a set of data into your calculator, the 1-variable statistics function will return the IQR as one of its calculated values.
This is a useful check but you should make sure you show your working clearly.
Worked Example
Roger planted a number of hot pepper seeds and recorded the number of days it took each seed to germinate. The results are listed below:
5 5 6 6 6 7 7 7 7 7 7 7 8
8 8 8 8 8 9 9 9 9 10 10 11 23
Roger calculates that the first quartile of his data set on hot pepper seeds is 7, and the third quartile is 9.
(a) What is the interquartile range of the data set?
Answer:
The interquartile range is the third quartile minus the first quartile
9 - 7 = 2
The interquartile range of the data set is 2 days
(b) Suggest a reason why the interquartile range might be a better measure of spread than the range for this data set.
Answer:
Note that the '23' is an outlier (extreme value)
This would affect the range, but not the IQR
The 23 in the data set is an outlier (extreme value) compared to all the other values
This would make the range very large and give a misleading idea about the spread of the data
The interquartile range is not affected by extreme values, and so will be a better measure of spread for this data set
Standard deviation & variance
What is the standard deviation of a data set?
The standard deviation of a data set is a measure of variability (i.e. a measure of spread)
It measures how the data is spread out relative to the mean
If the standard deviation is small then most data values are close to the mean (there is less variability)
If the standard deviation is large then many data values will be further away from the mean (there is greater variability)
If the data has units (seconds, cm, etc.), then the standard deviation has the same units as the values in the data set
The Greek letter (lower case sigma) is used for the population standard deviation
The English letter is used for the sample standard deviation
How do I calculate the standard deviation for a data set?
The standard deviation of a variable, , for a population, , can be calculated using the formula:
This is not given to you in the exam
In this formula:
is the number of values in the sample
is the mean of the sample
is 'any data value' in the sample
The standard deviation of a variable, , for a sample, , can be calculated using the formula:
This is given to you in the exam
Note that the formula for the population standard deviation is very similar to the formula for the sample standard deviation
You are just dividing by rather than
Examiner Tips and Tricks
In practice, you will only be asked to calculate the standard deviation for a sample, but you should be aware that the population standard deviation is a different formula to the sample standard deviation.
If you use your calculator to check or calculate a standard deviation, make sure that you are familiar with you calculator's use of notation so that you are looking at the correct result.
What are the benefits and limitations of the standard deviation as a measure of variability?
The standard deviation does not tell you where the mean is, but it does give you a measure of how far (on average) the data values are from their mean
The standard deviation will not necessarily become larger if more data values are added to the data set
Adding more terms to the calculation that are a similar distance from the mean will not affect the standard deviation
However, adding terms that are a greater distance from the mean will increase the standard deviation
Like the mean, the standard deviation is affected by extreme values
What is the variance of a data set?
The variance of a data set is the square of the standard deviation
is used to denote the population variance
is used to denote the sample variance
The variance is the average of the square of the differences between each data item and the mean
If the data has units (seconds, cm, etc.), then the variance has the same units squared as the values in the data set (seconds2, cm2, etc.)
The standard deviation is often the measure used rather than the variance as it has the same units as the data set
Worked Example
A sample of 5 data items have been taken from a population. The values are listed below.
6 9 2 11 5
(a) Calculate the mean of the sample.
Answer:
Add up the values and divide by the number of values, 5
The mean of the sample is 6.6
(b) Calculate the standard deviation of the sample.
Answer:
It is easiest to set up a table to work out the different values
Total |
Substitute the values into the formula
The standard deviation is 3.51
Sign up now. It’s free!
Did this page help you?