Chi-squared Test for Independence (DP IB Applications & Interpretation (AI)): Revision Note
Did this video help you?
Chi-Squared Test for Independence
What is a chi-squared test for independence?
A chi-squared (
) test for independence is a hypothesis test used to test whether two variables are independent of each other
This is sometimes called a
two-way test
This is an example of a goodness of fit test
We are testing whether the data fits the model that the variables are independent
The chi-squared (
) distribution is used for this test
You will use a contingency table
This is a two-way table that shows the observed frequencies for the different combinations of the two variables
For example: if the two variables are hair colour and eye colour then the contingency table will show the frequencies of the different combinations
Why might I have to combine rows or columns?
The observed values are used to calculate expected values
These are the expected frequencies for each combination assuming that the variables are independent
Your GDC can calculate these for you after you input the observed frequencies
The expected values must all be bigger than 5
If one of the expected values is less than 5 then you will have to combine the corresponding row or column in the matrix of observed values with the adjacent row or column
The decision between row or column will be based on which seems the most appropriate
For example: if the two variables are age and favourite TV genre then it is more appropriate to combine age groups than types of genre
What are the degrees of freedom?
There will be a minimum number of expected values you would need to know in order to be able to calculate all the expected values
This minimum number is called the degrees of freedom and is often denoted by
For a test for independence with an m × n contingency table
For example: If there are 5 rows and 3 columns then you only need to know 2 of the values in 4 of the rows as the rest can be calculated using the totals
What are the steps for a chi-squared test for independence?
STEP 1: Write the hypotheses
H0 : Variable X is independent of variable Y
H1 : Variable X is not independent of variable Y
Make sure you clearly write what the variables are and don’t just call them X and Y
STEP 2: Calculate the degrees of freedom for the test
For an m × n contingency table
Degrees of freedom is
STEP 3: Enter your observed frequencies into your GDC using the option for a 2-way test
Enter these as a matrix
Your GDC will give you a matrix of the expected values (assuming the variables are independent)
If any values are 5 or less then combine rows/columns and repeat step 2
Your GDC will also give you the χ² statistic and its p-value
The χ² statistic is denoted as
STEP 4: Decide whether there is evidence to reject the null hypothesis
EITHER compare the χ² statistic with the given critical value
If χ² statistic > critical value then reject H0
If χ² statistic < critical value then accept H0
OR compare the p-value with the given significance level
If p-value < significance level then reject H0
If p-value > significance level then accept H0
STEP 5: Write your conclusion
If you reject H0
There is sufficient evidence to suggest that variable X is not independent of variable Y
Therefore this suggests they are associated
If you accept H0
There is insufficient evidence to suggest that variable X is not independent of variable Y
Therefore this suggests they are independent
How do I calculate the chi-squared statistic?
You are expected to be able to use your GDC to calculate the χ² statistic by inputting the matrix of the observed frequencies
Seeing how it is done by hand might deepen your understanding but you are not expected to use this method
STEP 1: For each observed frequency Oi calculate its expected frequency Ei
Assuming the variables are independent
Ei = P(X = x) × P(Y = y) × Total
Which simplifies to
STEP 2: Calculate the χ² statistic using the formula
You do not need to learn this formula as your GDC calculates it for you
To calculate the p-value you would find the probability of a value being bigger than your χ² statistic using a χ² distribution with ν degrees of freedom
Examiner Tips and Tricks
Note for Internal Assessments (IA)
If you use a χ² test in your IA then beware that the outcome may not be accurate if there is only 1 degree of freedom
This means it is a 2 × 2 contingency table
Worked Example
At a school in Paris, it is believed that favourite film genre is related to favourite subject. 500 students were asked to indicate their favourite film genre and favourite subject from a selection and the results are indicated in the table below.
| Comedy | Action | Romance | Thriller |
Maths | 51 | 52 | 37 | 55 |
Sports | 59 | 63 | 41 | 33 |
Geography | 35 | 31 | 28 | 15 |
It is decided to test this hypothesis by using a test for independence at the 1% significance level.
The critical value is 16.812.
a) State the null and alternative hypotheses for this test.

b) Write down the number of degrees of freedom for this table.

c) Calculate the test statistic for this data.

d) Write down the conclusion to the test. Give a reason for your answer.

You've read 0 of your 5 free revision notes this week
Sign up now. It’s free!
Did this page help you?