Measures of Variability (College Board AP® Statistics)

Revision Note

Naomi C

Author

Naomi C

Expertise

Maths

Range

What is the range of a data set?

  • The range of a data set is the difference between the largest and smallest values in the data set

    • range equals largest space value minus smallest space value

    • If the data has units (seconds, cm, etc.), then the range has the same units as the values in the data set

  • The range is a measure of variability (i.e. a measure of spread)

    • Recall that an average (measure of center) for a data set tells you what a 'typical' data value is

    • The range tells you how spread out the data is around that average

      • A small range means all the data values are close to the average

      • A large range means that some of the data values are far from the average

  • The range is affected by outliers (extreme values, i.e. extremely large or extremely small)

    • Outliers can cause the range of a data set to be large

    • The range would then give a misleading idea about how spread out most of the data really is

Worked Example

Saffy counted the number of pairs of shoes that the students in her class owned. The results are listed below:

2, 6, 3, 3, 15, 4, 6, 7, 5, 4, 5,

5, 8, 4, 6, 6, 2, 7, 8, 5, 3

What is the range of the data set?

It can sometimes help to write the data in size order

2, 2, 3, 3, 3, 4, 4, 4, 5, 5,

5, 5, 6, 6, 6, 6, 7, 7, 8, 8, 15

The range is the largest value minus the smallest value

15 - 2 = 13

The range of the number of pairs of shoes that a student in Saffy's class owns is 13

Interquartile range

What is the interquartile range (IQR) of a data set?

  • The interquartile range (IQR) of a data set is the difference between the third quartile (Q3) and first quartile (Q1) of the data set

    • IQR equals straight Q 3 minus straight Q 1

    • The IQR has the same units as the values in the data set (seconds, cm, etc.)

  • The interquartile range is also a measure of variability (i.e. of spread)

    • Half of the data values in a data set are between Q1 and Q3

      • This 'middle half' of the data set may be thought of as the 'most typical' half of the data

      • The IQR tells you how spread out the values in that middle half are

  • The largest and smallest values in a data set do not affect the interquartile range

    • This makes the IQR a better measure of spread for data sets with outliers (extreme values, i.e. extremely large or extremely small)

Exam Tip

When you enter a set of data into your calculator, the 1-variable statistics function will return the IQR as one of its calculated values.

This is a useful check but you should make sure you show your working clearly.

Worked Example

Roger planted a number of hot pepper seeds and recorded the number of days it took each seed to germinate. The results are listed below:

5      5      6      6      6      7      7      7      7      7      7      7      8

8      8      8      8      8      9      9      9      9      10      10      11      23

Roger calculates that the first quartile of his data set on hot pepper seeds is 7, and the third quartile is 9.

(a) What is the interquartile range of the data set?

Answer:

The interquartile range is the third quartile minus the first quartile

9 - 7 = 2

The interquartile range of the data set is 2 days


(b) Suggest a reason why the interquartile range might be a better measure of spread than the range for this data set.

Answer:

Note that the '23' is an outlier (extreme value)

This would affect the range, but not the IQR

The 23 in the data set is an outlier (extreme value) compared to all the other values

This would make the range very large and give a misleading idea about the spread of the data

The interquartile range is not affected by extreme values, and so will be a better measure of spread for this data set

Standard deviation & variance

What is the standard deviation of a data set?

  • The standard deviation of a data set is a measure of variability (i.e. a measure of spread)

    • It measures how the data is spread out relative to the mean

      • If the standard deviation is small then most data values are close to the mean (there is less variability)

      • If the standard deviation is large then many data values will be further away from the mean (there is greater variability)

    • If the data has units (seconds, cm, etc.), then the standard deviation has the same units as the values in the data set

  • The Greek letter sigma (lower case sigma) is used for the population standard deviation

  • The English letter s is used for the sample standard deviation

How do I calculate the standard deviation for a data set?

  • The standard deviation of a variable, x, for a population, sigma subscript x, can be calculated using the formula:

    • sigma subscript x equals square root of 1 over n sum open parentheses x subscript i minus x with bar on top close parentheses squared end root equals square root of fraction numerator sum open parentheses x subscript i minus x with bar on top close parentheses squared over denominator n end fraction end root

    • This is not given to you in the exam

    • In this formula:

      • n is the number of values in the sample

      • x with bar on top is the mean of the sample

      • x subscript i is 'any data value' in the sample

  • The standard deviation of a variable, x, for a sample, s subscript x, can be calculated using the formula:

    • s subscript x equals square root of fraction numerator 1 over denominator n minus 1 end fraction sum open parentheses x subscript i minus x with bar on top close parentheses squared end root equals square root of fraction numerator sum open parentheses x subscript i minus x with bar on top close parentheses squared over denominator n minus 1 end fraction end root

    • This is given to you in the exam

  • Note that the formula for the population standard deviation is very similar to the formula for the sample standard deviation

    • You are just dividing by n rather than n minus 1

Exam Tip

In practice, you will only be asked to calculate the standard deviation for a sample, but you should be aware that the population standard deviation is a different formula to the sample standard deviation.

If you use your calculator to check or calculate a standard deviation, make sure that you are familiar with you calculator's use of notation so that you are looking at the correct result.

What are the benefits and limitations of the standard deviation as a measure of variability?

  • The standard deviation does not tell you where the mean is, but it does give you a measure of how far (on average) the data values are from their mean

  • The standard deviation will not necessarily become larger if more data values are added to the data set

    • Adding more terms to the calculation that are a similar distance from the mean will not affect the standard deviation

    • However, adding terms that are a greater distance from the mean will increase the standard deviation

  • Like the mean, the standard deviation is affected by extreme values

What is the variance of a data set?

  • The variance of a data set is the square of the standard deviation

    • sigma squared is used to denote the population variance

    • s squared is used to denote the sample variance

  • The variance is the average of the square of the differences between each data item and the mean

    • If the data has units (seconds, cm, etc.), then the variance has the same units squared as the values in the data set (seconds2, cm2, etc.)

  • The standard deviation is often the measure used rather than the variance as it has the same units as the data set

Worked Example

A sample of 5 data items have been taken from a population. The values are listed below.

6       9       2       11       5

(a) Calculate the mean of the sample.

Answer:

Add up the values and divide by the number of values, 5

fraction numerator 6 plus 9 plus 2 plus 11 plus 5 over denominator 5 end fraction equals 6.6

The mean of the sample is 6.6


(b) Calculate the standard deviation of the sample.

Answer:

It is easiest to set up a table to work out the different values

x

x minus x with bar on top

left parenthesis x minus x with bar on top right parenthesis squared

6

6 minus 6.6 equals negative 0.6

open parentheses negative 0.6 close parentheses squared equals 0.36

9

9 minus 6.6 equals 2.4

open parentheses 2.4 close parentheses squared equals 5.76

2

2 minus 6.6 equals negative 4.6

open parentheses negative 4.6 close parentheses squared equals 21.16

11

11 minus 6.6 equals 4.4

open parentheses 4.4 close parentheses squared equals 19.36

5

5 minus 6.6 equals negative 1.6

open parentheses negative 1.6 close parentheses squared equals 2.56

Total

0.36 plus 5.76 plus 21.26 plus 19.36 plus 2.56 equals 49.2


Substitute the values into the formula square root of fraction numerator 1 over denominator n minus 1 end fraction sum for blank of open parentheses x subscript i minus x with bar on top close parentheses squared end root

square root of fraction numerator 1 over denominator 5 minus 1 end fraction times 49.2 end root equals square root of 12.3 end root equals 3.50713...

The standard deviation is 3.51

You've read 0 of your 10 free revision notes

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Naomi C

Author: Naomi C

Naomi graduated from Durham University in 2007 with a Masters degree in Civil Engineering. She has taught Mathematics in the UK, Malaysia and Switzerland covering GCSE, IGCSE, A-Level and IB. She particularly enjoys applying Mathematics to real life and endeavours to bring creativity to the content she creates.