Association & Correlation Coefficients (College Board AP® Statistics)
Study Guide
Association
What is an association?
An association between two variables means that the variables are related to each other in some particular way
A change in one variable corresponds to a change in the other
It is possible for two variables to have no association
A change in one variable does not correspond to a change in the other
What is the direction of an association?
If an association exists, it can have different directions:
Positive association is when one variable increases, the other tends to increase
For example, as temperature increases, sales of cold drinks tend to increase
Negative association is when one variable increases, the other tends to decrease
For example, increasing the age of a car tends to decrease its value
What is the form of an association?
Having a positive or negative association does not mean the association is linear (a straight line)
There are many different forms an association could take, for example:
Linear forms follow straight lines
Non-linear forms follow curved lines, including:
quadratics and cubics
reciprocals, e.g.
exponentials, e.g.
What is the strength of an association?
The strength of an association is how well the data points on a scatterplot follow the form of the association
Strengths are described as either strong, moderate or weak
The stronger the strength, the more closely data points follow the form
e.g. data points may show a 'weak quadratic' association
What are unusual features of a scatterplot?
Unusual features of a scatterplot include
clusters
where data points appear to be in groups (clouds)
outliers
data points that do not appear to fit the general pattern shown
Examiner Tips and Tricks
In the exam, if asked to describe the relationship shown on a scatterplot, you should comment in context on the direction (positive, negative, none), form (linear, non-linear) and strength (strong, moderate, weak) of an association, as well as any unusual features (clusters, outliers).
Worked Example
Describe the relationship shown between the hours spent on a phone per day and the hours spent on a computer per day for nine students in a class, shown on the scatterplot below.
Answer:
You must comment on the strength, direction and form of the association seen
You must also comment on unusual features, in particular outliers and clusters
Remember to give your answer in context
The scatterplot reveals a strong, negative, roughly linear association between the hours spent on a phone per day and the hours spent on a computer per day for the nine students in the class
There are no significant outliers, though there is a slight clustering of points into two clusters (top left, between 1 and 3 hours on a phone per day, and bottom right, between 5 and 9 hours on a phone per day)
Correlation coefficients
What is correlation?
Correlation is a numerical measure of the direction and strength of a linear association between two variables
What is the correlation coefficient?
The correlation coefficient, , is a value between -1 and 1 where
means a perfect positive linear association
All points lie along the same straight line with a positive slope
means no linear association
means a perfect negative linear association
All points lie along the same straight line with a negative slope
Values in between can be described as weak, moderate or strong
e.g. is a 'strong positive linear' association
Points appear to roughly follow a straight line with a positive slope
What is the formula for the correlation coefficient?
For data points with coordinates , the formula for the correlation coefficient is
where is the sample standard deviation of the -values
recall that
and where is the sample standard deviation of the -values
and
However, in practice, the correlation coefficient is found using technology
e.g. using a calculator
Examiner Tips and Tricks
The formulas for , and are given in the exam, but the formulas for and are not (though they can easily be formed by looking at and ).
What else do I need to know about correlation coefficients?
You need to know that correlation coefficients, , are
always in the range
only measure strengths of linear relationships
so has no linear association, but may have a non-linear (curved) association
independent of units
changing the units of the and variables does not affect
affected by outliers
not affected by swapping the axes
i.e. plotting values on the axis and vice versa
What does the phrase "correlation does not imply causation" mean?
If two variables appear to correlate, it does not mean that one variable causes changes in the other variable
For example, each day you record the height of a sunflower and the weight of a puppy
As the height of the sunflower increases, the weight of the puppy increases
This shows a positive correlation
But you cannot claim that:
'increasing the heights of sunflowers causes puppies to weigh more'
or 'heavier puppies lead to taller sunflowers'!
Both variables are actually increasing separately due to a third variable
In this case, time
Sign up now. It’s free!
Did this page help you?