Chi-squared Test for Independence (DP IB Maths: AI SL)

Revision Note

Dan

Author

Dan

Last updated

Did this video help you?

Chi-Squared Test for Independence

What is a chi-squared test for independence?

  • A chi-squared (chi squared) test for independence is a hypothesis test used to test whether two variables are independent of each other
    • This is sometimes called a chi squared two-way test
  • This is an example of a goodness of fit test
    • We are testing whether the data fits the model that the variables are independent
  • The chi-squared (chi squared) distribution is used for this test
  • You will use a contingency table
    • This is a two-way table that shows the observed frequencies for the different combinations of the two variables
      • For example: if the two variables are hair colour and eye colour then the contingency table will show the frequencies of the different combinations

What are the degrees of freedom?

  • There will be a minimum number of expected values you would need to know in order to be able to calculate all the expected values
  • This minimum number is called the degrees of freedom and is often denoted by nu
  • For a test for independence with an m × n contingency table
    • nu equals left parenthesis m minus 1 right parenthesis cross times left parenthesis n minus 1 right parenthesis 
    • For example: If there are 5 rows and 3 columns then you only need to know 2 of the values in 4 of the rows as the rest can be calculated using the totals

What are the steps for a chi-squared test for independence?

  • STEP 1: Write the hypotheses
    • H0 : Variable X is independent of variable Y
    • H1 : Variable X is not independent of variable Y
      • Make sure you clearly write what the variables are and don’t just call them X and Y
  • STEP 2: Calculate the degrees of freedom for the test
    • For an m × n contingency table
    • Degrees of freedom is nu equals left parenthesis m minus 1 right parenthesis cross times left parenthesis n minus 1 right parenthesis
  • STEP 3: Enter your observed frequencies into your GDC using the option for a 2-way test
    • Enter these as a matrix
    • Your GDC will give you a matrix of the expected values (assuming the variables are independent)
    • Your GDC will also give you the χ² statistic and its p-value
    • The χ² statistic is denoted as chi subscript c a l c end subscript superscript 2
  • STEP 4: Decide whether there is evidence to reject the null hypothesis
    • EITHER compare the χ² statistic with the given critical value
      • If χ² statistic > critical value then reject H0
      • If χ² statistic < critical value then accept H0
    • OR compare the p-value with the given significance level
      • If p-value < significance level then reject H0
      • If p-value > significance level then accept H0
  • STEP 5: Write your conclusion
    • If you reject H0
      • There is sufficient evidence to suggest that variable X is not independent of variable Y
      • Therefore this suggests they are associated
    • If you accept H0
      • There is insufficient evidence to suggest that variable X is not independent of variable Y
      • Therefore this suggests they are independent

How do I calculate the chi-squared statistic?

  • You are expected to be able to use your GDC to calculate the χ² statistic by inputting the matrix of the observed frequencies
  • Seeing how it is done by hand might deepen your understanding but you are not expected to use this method
  • STEP 1: For each observed frequency Oi calculate its expected frequency Ei
    • Assuming the variables are independent
      • Ei = P(X = x) × P(Y = y) × Total
      • Which simplifies to 
  • STEP 2: Calculate the χ² statistic using the formula
    • chi subscript c a l c end subscript superscript 2 equals sum stretchy left parenthesis O subscript i minus E subscript i stretchy right parenthesis squared over E subscript i
    • You do not need to learn this formula as your GDC calculates it for you
  • To calculate the p-value you would find the probability of a value being bigger than your χ² statistic using a χ² distribution with ν degrees of freedom

Examiner Tip

Note for Internal Assessments (IA)

  • If you use a χ² test in your IA then beware that the outcome may not be accurate if:
    • Any of the expected values are less than 5
    • There is only 1 degree of freedom
      • This means it is a 2 × 2 contingency table
  • Note that none of these cases will occur in the exam

Worked example

At a school in Paris, it is believed that favourite film genre is related to favourite subject.  500 students were asked to indicate their favourite film genre and favourite subject from a selection and the results are indicated in the table below.

 

Comedy

Action

Romance

Thriller

Maths

51

52

37

55

Sports

59

63

41

33

Geography

35

31

28

15

It is decided to test this hypothesis by using a chi squared test for independence at the 1% significance level. 

The critical value is 16.812.

a)
State the null and alternative hypotheses for this test.

4-7-2-ib-ai-sl-chi-squared-ind-a-we-solution

b)
Write down the number of degrees of freedom for this table.

4-7-2-ib-ai-sl-chi-squared-ind-b-we-solution

c)
Calculate the chi squared test statistic for this data.

4-7-2-ib-ai-sl-chi-squared-ind-c-we-solution

d)
Write down the conclusion to the test. Give a reason for your answer.

4-7-2-ib-ai-sl-chi-squared-ind-d-we-solution

You've read 0 of your 5 free revision notes this week

Sign up now. It’s free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Dan

Author: Dan

Expertise: Maths

Dan graduated from the University of Oxford with a First class degree in mathematics. As well as teaching maths for over 8 years, Dan has marked a range of exams for Edexcel, tutored students and taught A Level Accounting. Dan has a keen interest in statistics and probability and their real-life applications.