Tests for Independence (College Board AP® Statistics)

Study Guide

Naomi C

Written by: Naomi C

Reviewed by: Dan Finlay

Updated on

Test for independence

What is a test for independence?

  • A chi-square open parentheses space chi squared close parentheses test for independence is used to determine whether there is a significant relationship between two categorical variables

    • i.e. if two variables are independent of each other or if they are related (dependent)

  • For example, you may have collected data on the grade level and school subject preference from a group of students

    • A test for independence could indicate whether the grade level of a student has an impact on their preferred school subject

  • A chi-square test for independence is a specific goodness of fit test

    • The observed values are compared with values that you would expect if the two variables are independent

  • Observed values can be shown in a two-way table

    • This is also known as a contingency table

  • E.g. a contingency table for the grade level and school subject preference of a group of students is shown below

Grade level

9th

10th

11th

12th

Total

Preferred subject

Math/Science

11

7

15

9

42

Humanities

8

8

6

5

27

Languages

6

4

9

12

31

Total

25

19

30

26

100

What are the null and alternative hypotheses for an independence test?

  • The null hypothesis, straight H subscript 0, is the assumption that the two categorical variables are independent

    • e.g. straight H subscript 0 space colon The grade level of a student is independent of their school subject preference (there is no association)

      • It is assumed to be correct, unless evidence proves otherwise

  • The alternative hypothesis, straight H subscript straight a, is the assumption that the two categorical variables are not independent

    • e.g. straight H subscript straight a colon The grade level of a student is not independent of their school subject preference (there is an association)

Examiner Tips and Tricks

In an exam, a test for independence may also be referred to as a test for an association between two variables, but be careful with the wording: if two variables are independent, then there is no association between them.

What are the conditions for an independence test?

  • When performing a chi-square independence test:

    • Observed values must come from a random sample

    • Observed values must be independent

      • They are sampled with replacement

      • or the sample size is less than 10% of the population size

    • Expected values must meet the large counts condition

      • Each expected value must be greater than or equal to 5

      • or at least 80% of the expected values are greater than 5 and all are greater than or equal to 1

Examiner Tips and Tricks

In the exam, either condition is accepted for the large counts condition.

How do I calculate a chi-square value?

  • The chi-square value for the test of independence, X squared, can be calculated from the formula given to you in the exam

    • chi squared equals sum from blank to blank of open parentheses observed minus expected close parentheses squared over expected

  • The larger X squared is, the more different the observed values are from the expected values

  • To be able to calculate the chi-square value, you therefore need to find the expected values first

  • To calculate the expected value for a particular cell, multiply together:

    • the probability of being in that particular row

    • by the probability of being in that particular column

    • by the total number in the sample

    • This is equivalent to simply multiplying the row total by the column total and dividing by the grand total

  • E.g. the expected value for the number of 10th graders who prefer languages in the example above is

    • 31 over 100 times 19 over 100 times 100 equals 5.89

    • or just fraction numerator 31 times 19 over denominator 100 end fraction equals 5.89

  • The table below shows all the expected values for the example above

Grade level

9th

10th

11th

12th

Total

Preferred subject

Math/Science

10.5

7.98

12.6

10.92

42

Humanities

6.75

5.13

8.1

7.02

27

Languages

7.75

5.89

9.3

8.06

31

Total

25

19

30

26

100

Examiner Tips and Tricks

Expected values do not need to be integer values, so leave them unrounded to avoid calculation errors!

What are degrees of freedom?

  • The number of degrees of freedom, 'dof', is equal to

    • the open parentheses number space of space rows minus 1 close parentheses open parentheses number space of space columns minus 1 close parentheses

    • e.g. dof for the contingency table above is open parentheses 3 minus 1 close parentheses open parentheses 4 minus 1 close parentheses equals 2 times 3 equals 6

How do I use the chi-square distribution table?

  • You can use the chi-square tables given to you in the exam to find the critical value

    • This is the threshold value that determines whether you reject the null hypothesis or not

  • To find the critical value from the tables, you need the significance levelalpha percent sign and the dof

    • The critical value is located in the cell where the relevant row and column intersect

How do I conclude a hypothesis test?

  • Conclusions to a hypothesis test need to show two things:

    • a decision about the null hypothesis

    • an interpretation of this decision in the context of the question

  • To make the decision, compare the calculated chi-square value, X squared, to the critical value from the table

    • If X squared greater than critical space value then the null hypothesis should be rejected

      • The two categorical variables are not independent

    • If X squared less than critical space value then the null hypothesis should not be rejected

      • There is not enough evidence to say that the two categorical variables are not independent

How can I perform an independence test on the calculator?

  • To complete an independence test on your calculator:

    • Create a matrix of the observed values

    • Perform a chi-square test

      • This is often called a chi squared two-way test on a calculator

    • Compare your calculated X squared, with the critical value from the chi-square tables

  • Alternatively, you can compare the given significance level, alpha, with the calculator's p-value

    • If using the p-value, remember

      • p less than alpha, reject the null hypothesis

      • p greater than alpha, do not reject the null hypothesis

Examiner Tips and Tricks

Even if you perform the independence test on your calculator, it is still important to show all of your working to demonstrate full understanding. Depending on the question, you may need to show how the chi-square statistic is calculated in full or just how an expected value and the degrees of freedom are calculated.

If you compare the p-value with alpha, don't forget that the inequalities are the opposite to when you are comparing the X squared value to the critical value when you are determining whether or not to reject the null hypothesis!

Worked Example

A coffee company wanted to understand more about who their customers were. They took a random sample of 200 individuals to see if there was an association between an individual's relationship status and whether they were a coffee drinker or not.

The outcomes of their research are shown in the table below.

Single

Married / Cohabiting

Other

Total

Coffee drinker

48

25

37

110

Non-coffee drinker

23

32

35

90

Total

71

57

72

200

Determine, at the 5% significance level, if there is an association between an individual's relationship status and whether they are a coffee drinker or not.

Write the null and alternative hypotheses

straight H subscript 0 space colon There is no association between an individual's relationship status and whether they drink coffee or not, they are independent

straight H subscript straight a space colon spaceThere is an association between an individual's relationship status and whether they drink coffee or not, they are not independent

State the type of test being used

The correct inference procedure is a chi-square test of independence at alpha equals 0.05

Calculate the expected values (by multiplying the row total by the column total and dividing by the grand total)

Single

Married / Cohabiting

Other

Total

Coffee drinker

39.05

31.35

39.6

110

Non-coffee drinker

31.95

25.65

32.4

90

Total

71

57

72

200

Verify the conditions for the test

All conditions for inference have been met:

  • The observed values are independent as the sample of individuals is randomly selected

  • All expected values are greater than 5

Calculate the chi-square value, X squared, sum open parentheses observed minus expected close parentheses squared over expected

table row cell X squared end cell equals cell fraction numerator open parentheses 48 minus 39.05 close parentheses squared over denominator 39.05 end fraction plus fraction numerator open parentheses 25 minus 31.35 close parentheses squared over denominator 31.35 end fraction plus fraction numerator open parentheses 37 minus 39.6 close parentheses squared over denominator 39.6 end fraction plus fraction numerator open parentheses 23 minus 31.95 close parentheses squared over denominator 31.95 end fraction plus fraction numerator open parentheses 32 minus 25.65 close parentheses squared over denominator 25.65 end fraction plus fraction numerator open parentheses 35 minus 32.4 close parentheses squared over denominator 32.4 end fraction end cell row blank equals cell 7.795... end cell end table

State the number of degrees of freedom

degrees of freedom = open parentheses 2 minus 1 close parentheses open parentheses 3 minus 1 close parentheses equals 2

Find the critical value from the chi-square tables

Find the row corresponding to 2 degrees of freedom and the column corresponding to alpha equals 0.05

critical space value equals 5.99

Compare the calculated X squared value to the critical value and state the conclusion of the test

table row cell 7.795... end cell greater than cell 5.99 end cell row cell X squared end cell greater than cell critical space value end cell end table

straight H subscript 0 is rejected

Interpret the result in the context of the question

There is sufficient evidence to suggest that there is an association between an individual's relationship status and whether they are a coffee drinker or not, i.e. sufficient evidence that they are not independent

Sign up now. It’s free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Naomi C

Author: Naomi C

Expertise: Maths

Naomi graduated from Durham University in 2007 with a Masters degree in Civil Engineering. She has taught Mathematics in the UK, Malaysia and Switzerland covering GCSE, IGCSE, A-Level and IB. She particularly enjoys applying Mathematics to real life and endeavours to bring creativity to the content she creates.

Dan Finlay

Author: Dan Finlay

Expertise: Maths Lead

Dan graduated from the University of Oxford with a First class degree in mathematics. As well as teaching maths for over 8 years, Dan has marked a range of exams for Edexcel, tutored students and taught A Level Accounting. Dan has a keen interest in statistics and probability and their real-life applications.