Tests for Independence (College Board AP® Statistics)
Study Guide
Test for independence
What is a test for independence?
A chi-square test for independence is used to determine whether there is a significant relationship between two categorical variables
i.e. if two variables are independent of each other or if they are related (dependent)
For example, you may have collected data on the grade level and school subject preference from a group of students
A test for independence could indicate whether the grade level of a student has an impact on their preferred school subject
A chi-square test for independence is a specific goodness of fit test
The observed values are compared with values that you would expect if the two variables are independent
Observed values can be shown in a two-way table
This is also known as a contingency table
E.g. a contingency table for the grade level and school subject preference of a group of students is shown below
| Grade level | |||||
9th | 10th | 11th | 12th | Total | ||
Preferred subject | Math/Science | 11 | 7 | 15 | 9 | 42 |
Humanities | 8 | 8 | 6 | 5 | 27 | |
Languages | 6 | 4 | 9 | 12 | 31 | |
Total | 25 | 19 | 30 | 26 | 100 |
What are the null and alternative hypotheses for an independence test?
The null hypothesis, , is the assumption that the two categorical variables are independent
e.g. The grade level of a student is independent of their school subject preference (there is no association)
It is assumed to be correct, unless evidence proves otherwise
The alternative hypothesis, , is the assumption that the two categorical variables are not independent
e.g. The grade level of a student is not independent of their school subject preference (there is an association)
Examiner Tips and Tricks
In an exam, a test for independence may also be referred to as a test for an association between two variables, but be careful with the wording: if two variables are independent, then there is no association between them.
What are the conditions for an independence test?
When performing a chi-square independence test:
Observed values must come from a random sample
Observed values must be independent
They are sampled with replacement
or the sample size is less than 10% of the population size
Expected values must meet the large counts condition
Each expected value must be greater than or equal to 5
or at least 80% of the expected values are greater than 5 and all are greater than or equal to 1
Examiner Tips and Tricks
In the exam, either condition is accepted for the large counts condition.
How do I calculate a chi-square value?
The chi-square value for the test of independence, , can be calculated from the formula given to you in the exam
The larger is, the more different the observed values are from the expected values
To be able to calculate the chi-square value, you therefore need to find the expected values first
To calculate the expected value for a particular cell, multiply together:
the probability of being in that particular row
by the probability of being in that particular column
by the total number in the sample
This is equivalent to simply multiplying the row total by the column total and dividing by the grand total
E.g. the expected value for the number of 10th graders who prefer languages in the example above is
or just
The table below shows all the expected values for the example above
| Grade level | |||||
9th | 10th | 11th | 12th | Total | ||
Preferred subject | Math/Science | 10.5 | 7.98 | 12.6 | 10.92 | 42 |
Humanities | 6.75 | 5.13 | 8.1 | 7.02 | 27 | |
Languages | 7.75 | 5.89 | 9.3 | 8.06 | 31 | |
Total | 25 | 19 | 30 | 26 | 100 |
Examiner Tips and Tricks
Expected values do not need to be integer values, so leave them unrounded to avoid calculation errors!
What are degrees of freedom?
The number of degrees of freedom, 'dof', is equal to
the
e.g. dof for the contingency table above is
How do I use the chi-square distribution table?
You can use the chi-square tables given to you in the exam to find the critical value
This is the threshold value that determines whether you reject the null hypothesis or not
To find the critical value from the tables, you need the significance level, and the dof
The critical value is located in the cell where the relevant row and column intersect
How do I conclude a hypothesis test?
Conclusions to a hypothesis test need to show two things:
a decision about the null hypothesis
an interpretation of this decision in the context of the question
To make the decision, compare the calculated chi-square value, , to the critical value from the table
If then the null hypothesis should be rejected
The two categorical variables are not independent
If then the null hypothesis should not be rejected
There is not enough evidence to say that the two categorical variables are not independent
How can I perform an independence test on the calculator?
To complete an independence test on your calculator:
Create a matrix of the observed values
Perform a chi-square test
This is often called a two-way test on a calculator
Compare your calculated , with the critical value from the chi-square tables
Alternatively, you can compare the given significance level, , with the calculator's -value
If using the -value, remember
, reject the null hypothesis
, do not reject the null hypothesis
Examiner Tips and Tricks
Even if you perform the independence test on your calculator, it is still important to show all of your working to demonstrate full understanding. Depending on the question, you may need to show how the chi-square statistic is calculated in full or just how an expected value and the degrees of freedom are calculated.
If you compare the -value with , don't forget that the inequalities are the opposite to when you are comparing the value to the critical value when you are determining whether or not to reject the null hypothesis!
Worked Example
A coffee company wanted to understand more about who their customers were. They took a random sample of 200 individuals to see if there was an association between an individual's relationship status and whether they were a coffee drinker or not.
The outcomes of their research are shown in the table below.
Single | Married / Cohabiting | Other | Total | |
---|---|---|---|---|
Coffee drinker | 48 | 25 | 37 | 110 |
Non-coffee drinker | 23 | 32 | 35 | 90 |
Total | 71 | 57 | 72 | 200 |
Determine, at the 5% significance level, if there is an association between an individual's relationship status and whether they are a coffee drinker or not.
Write the null and alternative hypotheses
There is no association between an individual's relationship status and whether they drink coffee or not, they are independent
There is an association between an individual's relationship status and whether they drink coffee or not, they are not independent
State the type of test being used
The correct inference procedure is a chi-square test of independence at
Calculate the expected values (by multiplying the row total by the column total and dividing by the grand total)
Single | Married / Cohabiting | Other | Total | |
---|---|---|---|---|
Coffee drinker | 39.05 | 31.35 | 39.6 | 110 |
Non-coffee drinker | 31.95 | 25.65 | 32.4 | 90 |
Total | 71 | 57 | 72 | 200 |
Verify the conditions for the test
All conditions for inference have been met:
The observed values are independent as the sample of individuals is randomly selected
All expected values are greater than 5
Calculate the chi-square value, ,
State the number of degrees of freedom
degrees of freedom =
Find the critical value from the chi-square tables
Find the row corresponding to 2 degrees of freedom and the column corresponding to
Compare the calculated value to the critical value and state the conclusion of the test
is rejected
Interpret the result in the context of the question
There is sufficient evidence to suggest that there is an association between an individual's relationship status and whether they are a coffee drinker or not, i.e. sufficient evidence that they are not independent
Sign up now. It’s free!
Did this page help you?