Tests for Homogeneity (College Board AP® Statistics)

Study Guide

Test yourself
Naomi C

Written by: Naomi C

Reviewed by: Dan Finlay

Test for homogeneity

What is a test for homogeneity?

  • A chi-square open parentheses space chi squared close parentheses test for homogeneity is used to determine whether the distribution of a single categorical variable is the same across different populations or groups

    • The test is conducted in exactly the same way

  • For example, you may have collected data on the preferred sport for students from a few different schools

    • A test for homogeneity will indicate whether the distribution of preferences for different sports is the same across the different schools

  • A chi-square test for homogeneity is a specific goodness of fit test

    • The observed values are compared with values that you would expect if the distribution of the variable is the same for all populations

    • The test for homogeneity is performed in the same way as the chi-square test for independence

  • Observed values can be shown in a contingency table (as they are for an independence test)

    • However, one variable is treated as separate populations rather than as a category

  • E.g. a contingency table for the sport preference for a sample of students in 3 different schools is shown below

    • By conducting a test for homogeneity, we are asking if the proportion of students who prefer the different sports is the same across each of the schools

Preferred sport

Football

Basketball

Swimming

Total

School

School 1

37

78

25

140

School 2

25

36

19

80

School 3

48

106

26

180

Total

110

220

70

400

Examiner Tips and Tricks

Because the chi-square test for independence and the chi-square test for homogeneity are performed in the same way, they are often confused with each other. The difference between the two tests are the sampling methods and the interpretation.

A test of independence typically uses one sample from a single population and determines if two variables have an association. A test for homogeneity uses multiple samples from different populations and compares the distribution of one variable across those populations.

What are the null and alternative hypotheses for a homogeneity test?

  • The null hypothesis, straight H subscript 0, is the assumption that the proportions of the variable being tested are the same across all groups

    • e.g. straight H subscript 0 space colon The proportions of students who prefer football, basketball and swimming are the same among the populations from Schools 1, 2 and 3

      • It is assumed to be correct, unless evidence proves otherwise

  • The alternative hypothesis, straight H subscript straight a, is the assumption that the proportions of the variable being tested are not the same across all groups

    • e.g. straight H subscript straight a colon The proportions of students who prefer football, basketball and swimming are the not same among the populations from Schools 1, 2 and 3

What are the conditions for a homogeneity test?

  • When performing a chi-square test for homogeneity:

    • Observed values must come from random samples

    • Observed values must be independent

      • They are sampled with replacement

      • or the sample size is less than 10% of the population size

    • Expected values must meet the large counts condition

      • Each expected value must be greater than or equal to 5

How do I calculate a chi-square value?

  • The chi-square value for the homogeneity test, X squared, can be calculated from the formula given to you in the exam

    • chi squared equals sum from blank to blank of open parentheses observed minus expected close parentheses squared over expected

  • The larger X squared is, the more different the observed values are from the expected values

  • To be able to calculate the chi-square value, you therefore need to find the expected values first

  • To calculate the expected value for a particular cell, multiply together:

    • the probability of being in that particular row

    • by the probability of being in that particular column

    • by the total number in the sample

    • This is equivalent to simply multiplying the row total by the column total and dividing by the grand total

  • E.g. the expected value for the number of students from school 2 that prefer football for the example above is

    • 80 over 400 times 110 over 400 times 400 equals 22

    • or just fraction numerator 80 times 110 over denominator 400 end fraction equals 22

  • The table below shows all the expected values for the example above

Preferred sport

Football

Basketball

Swimming

Total

School

School 1

38.5

77

24.5

140

School 2

22

44

14

80

School 3

49.5

99

31.5

180

Total

110

220

70

400

Examiner Tips and Tricks

Expected values do not need to be integer values, so leave them unrounded to avoid calculation errors!

What are degrees of freedom?

  • The number of degrees of freedom, 'dof', is equal to

    • the open parentheses number space of space rows minus 1 close parentheses open parentheses number space of space columns minus 1 close parentheses

    • e.g. dof for the contingency table above is open parentheses 3 minus 1 close parentheses open parentheses 3 minus 1 close parentheses equals 2 times 2 equals 4

How do I use the chi-square distribution table?

  • You can use the chi-square tables given to you in the exam to find the critical value

    • This is the threshold value that determines whether you reject the null hypothesis or not

  • To find critical value from the tables, you need the significance levelalpha percent sign and the dof

    • The critical value is located in the cell where the relevant row and column intersect

How do I conclude a hypothesis test?

  • Conclusions to a hypothesis test need to show two things:

    • a decision about the null hypothesis

    • an interpretation of this decision in the context of the question

  • To make the decision, compare the calculated goodness of fit value, X squared, to the critical value from the table

    • If X squared greater than critical space value then the null hypothesis should be rejected

      • There is sufficient evidence to suggest that the proportions of the variable being tested are not the same across all groups

    • If X squared less than critical space value then the null hypothesis should not be rejected

      • There is not sufficient evidence to suggest that the proportions of the variable being tested are not the same across all groups

How can I perform a homogeneity test on the calculator?

  • To complete a homogeneity test on your calculator:

    • Create a matrix of the observed values

    • Perform a chi-square test

      • This is often called a chi squared two-way test on a calculator

    • Compare your calculated X squared, with the critical value from the chi-square tables

  • Alternatively, you can compare the given significance level, alpha, with the calculator's p-value

    • If using the p-value, remember

      • p less than alpha, reject the null hypothesis

      • p greater than alpha, do not reject the null hypothesis

Examiner Tips and Tricks

Even if you perform the homogeneity test on your calculator, it is still important to show all of your working to demonstrate full understanding. Depending on the question, you may need to show how the chi-square statistic is calculated in full or just how an expected value and the degrees of freedom are calculated.

If you compare the p-value with alpha, don't forget that the inequalities are the opposite to when you are comparing the X squared value to the critical value when you are determining whether or not to reject the null hypothesis!

Worked Example

A political analyst wants to determine if voting preferences for three major political parties (Party S, Party M, and Party E) are consistent across three different cities (City X, City Y, and City Z) in a country. The analyst conducts a survey, randomly sampling 300 voters from each city. The results are shown in the table below.

Party S

Party M

Party E

Total

City X

122

97

81

300

City Y

91

109

100

300

City Z

137

84

79

300

Total

350

290

260

900

A chi-square test is conducted at the 10% significance level. All conditions for inference are met. The test statistic is calculated as chi squared equals 15.769 and the p-value is 0.003.

What is the most appropriate conclusion to draw from this test?

(A) There is sufficient evidence that the proportions of voters for the political parties are the same for each city.

(B) There is sufficient evidence that the city a voter lives in is independent of their political party of choice.

(C) There is insufficient evidence that the city a voter lives in is independent of their political party of choice.

(D) This shows that the city you live in causes you to vote for a particular party.

(E) There is insufficient evidence that the proportions of voters for the political parties are the same for each city.

Answer:

This is a test for homogeneity, so the null hypothesis is that the proportions of voters for the different parties are the same in each city

The p-value is less than the significance level, 0.003 < 0.1

Or, alternatively

The test statistic is greater than the critical value for open parentheses 3 minus 1 close parentheses open parentheses 3 minus 1 close parentheses equals 2 times 2 equals 4 space dof,
15.769 greater than 7.78

Therefore the null hypothesis should be rejected, so the proportions of voters for the different parties are not the same in each city

Consider option (A)

This is stating the null hypothesis, but the null hypothesis is meant to be rejected, so option A is incorrect

Consider option (B)

The appropriate chi-square test for this situation is a homogeneity test

This is a conclusions for an independence test, so option B is incorrect

Consider option (C)

The appropriate chi-square test for this situation is a homogeneity test

This is a conclusions for an independence test, so option C is incorrect

Consider option (D)

The test for homogeneity is not a test for cause and effect, so option D is incorrect

Consider option (E)

This is rejecting the null hypothesis, so option E is correct

Option E

Last updated:

You've read 0 of your 10 free study guides

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Naomi C

Author: Naomi C

Expertise: Maths

Naomi graduated from Durham University in 2007 with a Masters degree in Civil Engineering. She has taught Mathematics in the UK, Malaysia and Switzerland covering GCSE, IGCSE, A-Level and IB. She particularly enjoys applying Mathematics to real life and endeavours to bring creativity to the content she creates.

Dan Finlay

Author: Dan Finlay

Expertise: Maths Lead

Dan graduated from the University of Oxford with a First class degree in mathematics. As well as teaching maths for over 8 years, Dan has marked a range of exams for Edexcel, tutored students and taught A Level Accounting. Dan has a keen interest in statistics and probability and their real-life applications.