Did this video help you?
Chi-squared Test for Independence (DP IB Maths: AI HL)
Revision Note
Chi-Squared Test for Independence
What is a chi-squared test for independence?
- A chi-squared () test for independence is a hypothesis test used to test whether two variables are independent of each other
- This is sometimes called a two-way test
- This is an example of a goodness of fit test
- We are testing whether the data fits the model that the variables are independent
- The chi-squared () distribution is used for this test
- You will use a contingency table
- This is a two-way table that shows the observed frequencies for the different combinations of the two variables
- For example: if the two variables are hair colour and eye colour then the contingency table will show the frequencies of the different combinations
- This is a two-way table that shows the observed frequencies for the different combinations of the two variables
Why might I have to combine rows or columns?
- The observed values are used to calculate expected values
- These are the expected frequencies for each combination assuming that the variables are independent
- Your GDC can calculate these for you after you input the observed frequencies
- These are the expected frequencies for each combination assuming that the variables are independent
- The expected values must all be bigger than 5
- If one of the expected values is less than 5 then you will have to combine the corresponding row or column in the matrix of observed values with the adjacent row or column
- The decision between row or column will be based on which seems the most appropriate
- For example: if the two variables are age and favourite TV genre then it is more appropriate to combine age groups than types of genre
- The decision between row or column will be based on which seems the most appropriate
What are the degrees of freedom?
- There will be a minimum number of expected values you would need to know in order to be able to calculate all the expected values
- This minimum number is called the degrees of freedom and is often denoted by
- For a test for independence with an m × n contingency table
- For example: If there are 5 rows and 3 columns then you only need to know 2 of the values in 4 of the rows as the rest can be calculated using the totals
What are the steps for a chi-squared test for independence?
- STEP 1: Write the hypotheses
- H0 : Variable X is independent of variable Y
- H1 : Variable X is not independent of variable Y
- Make sure you clearly write what the variables are and don’t just call them X and Y
- STEP 2: Calculate the degrees of freedom for the test
- For an m × n contingency table
- Degrees of freedom is
- STEP 3: Enter your observed frequencies into your GDC using the option for a 2-way test
- Enter these as a matrix
- Your GDC will give you a matrix of the expected values (assuming the variables are independent)
- If any values are 5 or less then combine rows/columns and repeat step 2
- Your GDC will also give you the χ² statistic and its p-value
- The χ² statistic is denoted as
- STEP 4: Decide whether there is evidence to reject the null hypothesis
- EITHER compare the χ² statistic with the given critical value
- If χ² statistic > critical value then reject H0
- If χ² statistic < critical value then accept H0
- OR compare the p-value with the given significance level
- If p-value < significance level then reject H0
- If p-value > significance level then accept H0
- EITHER compare the χ² statistic with the given critical value
- STEP 5: Write your conclusion
- If you reject H0
- There is sufficient evidence to suggest that variable X is not independent of variable Y
- Therefore this suggests they are associated
- If you accept H0
- There is insufficient evidence to suggest that variable X is not independent of variable Y
- Therefore this suggests they are independent
- If you reject H0
How do I calculate the chi-squared statistic?
- You are expected to be able to use your GDC to calculate the χ² statistic by inputting the matrix of the observed frequencies
- Seeing how it is done by hand might deepen your understanding but you are not expected to use this method
- STEP 1: For each observed frequency Oi calculate its expected frequency Ei
- Assuming the variables are independent
- Ei = P(X = x) × P(Y = y) × Total
- Which simplifies to
- Assuming the variables are independent
- STEP 2: Calculate the χ² statistic using the formula
- You do not need to learn this formula as your GDC calculates it for you
- To calculate the p-value you would find the probability of a value being bigger than your χ² statistic using a χ² distribution with ν degrees of freedom
Examiner Tip
Note for Internal Assessments (IA)
- If you use a χ² test in your IA then beware that the outcome may not be accurate if there is only 1 degree of freedom
- This means it is a 2 × 2 contingency table
Worked example
At a school in Paris, it is believed that favourite film genre is related to favourite subject. 500 students were asked to indicate their favourite film genre and favourite subject from a selection and the results are indicated in the table below.
|
Comedy |
Action |
Romance |
Thriller |
Maths |
51 |
52 |
37 |
55 |
Sports |
59 |
63 |
41 |
33 |
Geography |
35 |
31 |
28 |
15 |
It is decided to test this hypothesis by using a test for independence at the 1% significance level.
The critical value is 16.812.
You've read 0 of your 5 free revision notes this week
Sign up now. It’s free!
Did this page help you?