Contingency Tables
What is a chi-squared test using contingency tables?
- A chi-squared () using contingency tables is a hypothesis test used to test whether two variables are independent of each other
- For example whether or not favourite music genre is independent of the age group of the listener
- This is sometimes called a two-way test
- This is an example of a goodness of fit test
- We are testing whether the data fits the modelling assumption that the variables are independent
- The chi-squared () distribution is used for this test
- You will use a contingency table
- This is a two-way table that shows the observed frequencies for the different combinations of the two variables
- A contingency table has rows and columns
- This does not include any rows or columns used to record the row, column and grand totals
Why might I have to combine rows or columns?
- The observed values are used to calculate expected values
- These are the expected frequencies for each combination assuming that the variables are independent
- Your calculator may be able to calculate these for you after you input the observed frequencies
- Or else you can calculate them using the formula
- These are the expected frequencies for each combination assuming that the variables are independent
- None of the expected values used to run the test can be less than 5
- If one of the expected values is less than 5 then you will have to combine the corresponding row or column in the table of observed values with the adjacent row or column
- The decision between row or column will be based on which seems the most appropriate
- For example: if the two variables are age and favourite music genre then it is more appropriate to combine age groups than types of genre
What are the degrees of freedom?
- There will be a minimum number of expected values you would need to know in order to be able to calculate all the expected values
- This minimum number is called the degrees of freedom and is often denoted by
- For a test for independence with an contingency table
- For example: If there are 5 rows and 3 columns then you only need to know 2 of the values in 4 of the rows as the rest can be calculated using the totals
What are the steps for a chi-squared test using contingency tables?
- STEP 1: Write the hypotheses
- : Variable X and Variable Y are independent
- : Variable X and Variable Y are not independent
- The hypotheses should always be stated in the context of the question
- Make sure you clearly write what the variables are and don’t just call them 'Variable X' and 'Variable Y'
- STEP 2: Calculate the expected frequencies
- Use the formula
-
- You will need to combine rows or columns if any of the expected frequencies are less than 5
- This process is described above
- After combining, calculate the new expected frequencies for the modified table
- You may also be able to enter the observed frequencies as a matrix in your calculator
- Use the option for a 2-way test
- Your calculator will calculate the matrix of expected frequencies
- You will need to combine rows or columns if any of the expected frequencies are less than 5
- STEP 3: Calculate the degrees of freedom for the test
- For an contingency table (after combining)
- Degrees of freedom is
- For an contingency table (after combining)
- STEP 4: Calculate using the formula
-
- then you will also need to determine the appropriate critical value
- use the 'Percentage Points of the Distribution' table in the exam formula booklet
- If you entered the observed frequencies as a matrix in your calculator
- then your calculator's 2-way test option will give you the test statistic and the associated p-value
-
- STEP 5: Decide whether there is evidence to reject the null hypothesis
- Compare the statistic with the critical value you have determined
- If > critical value (or ) then there is sufficient evidence to reject
- If < critical value (or ) then there is insufficient evidence to reject
- Compare the statistic with the critical value you have determined
- STEP 6: Write your conclusion
- If you reject H0
- Variable X and variable Y are not independent
- If you do not reject H0
- Variable X and variable Y are independent
- Be sure to state your conclusion in the context of the question
- If you reject H0
Worked example
At a school in Paris, it is believed that favourite film genre is related to favourite subject. 500 students were asked to indicate their favourite film genre and favourite subject from a selection and the results are indicated in the table below.
|
Comedy |
Action |
Romance |
Thriller |
Total |
Maths |
51 |
52 |
37 |
55 |
195 |
Sports |
59 |
63 |
41 |
33 |
196 |
Geography |
35 |
31 |
28 |
15 |
109 |
Total |
145 | 146 | 106 | 103 | 500 |
Using the statistic and a significance test at the a 1% level, test these results to see if there is an association between favourite film genre and favourite subject. State your conclusions.