Tests for Homogeneity (College Board AP® Statistics)
Study Guide
Test for homogeneity
What is a test for homogeneity?
A chi-square test for homogeneity is used to determine whether the distribution of a single categorical variable is the same across different populations or groups
The test is conducted in exactly the same way
For example, you may have collected data on the preferred sport for students from a few different schools
A test for homogeneity will indicate whether the distribution of preferences for different sports is the same across the different schools
A chi-square test for homogeneity is a specific goodness of fit test
The observed values are compared with values that you would expect if the distribution of the variable is the same for all populations
The test for homogeneity is performed in the same way as the chi-square test for independence
Observed values can be shown in a contingency table (as they are for an independence test)
However, one variable is treated as separate populations rather than as a category
E.g. a contingency table for the sport preference for a sample of students in 3 different schools is shown below
By conducting a test for homogeneity, we are asking if the proportion of students who prefer the different sports is the same across each of the schools
| Preferred sport | ||||
Football | Basketball | Swimming | Total | ||
School | School 1 | 37 | 78 | 25 | 140 |
School 2 | 25 | 36 | 19 | 80 | |
School 3 | 48 | 106 | 26 | 180 | |
Total | 110 | 220 | 70 | 400 |
Examiner Tips and Tricks
Because the chi-square test for independence and the chi-square test for homogeneity are performed in the same way, they are often confused with each other. The difference between the two tests are the sampling methods and the interpretation.
A test of independence typically uses one sample from a single population and determines if two variables have an association. A test for homogeneity uses multiple samples from different populations and compares the distribution of one variable across those populations.
What are the null and alternative hypotheses for a homogeneity test?
The null hypothesis, , is the assumption that the proportions of the variable being tested are the same across all groups
e.g. The proportions of students who prefer football, basketball and swimming are the same among the populations from Schools 1, 2 and 3
It is assumed to be correct, unless evidence proves otherwise
The alternative hypothesis, , is the assumption that the proportions of the variable being tested are not the same across all groups
e.g. The proportions of students who prefer football, basketball and swimming are the not same among the populations from Schools 1, 2 and 3
What are the conditions for a homogeneity test?
When performing a chi-square test for homogeneity:
Observed values must come from random samples
Observed values must be independent
They are sampled with replacement
or the sample size is less than 10% of the population size
Expected values must meet the large counts condition
Each expected value must be greater than or equal to 5
How do I calculate a chi-square value?
The chi-square value for the homogeneity test, , can be calculated from the formula given to you in the exam
The larger is, the more different the observed values are from the expected values
To be able to calculate the chi-square value, you therefore need to find the expected values first
To calculate the expected value for a particular cell, multiply together:
the probability of being in that particular row
by the probability of being in that particular column
by the total number in the sample
This is equivalent to simply multiplying the row total by the column total and dividing by the grand total
E.g. the expected value for the number of students from school 2 that prefer football for the example above is
or just
The table below shows all the expected values for the example above
| Preferred sport | ||||
Football | Basketball | Swimming | Total | ||
School | School 1 | 38.5 | 77 | 24.5 | 140 |
School 2 | 22 | 44 | 14 | 80 | |
School 3 | 49.5 | 99 | 31.5 | 180 | |
Total | 110 | 220 | 70 | 400 |
Examiner Tips and Tricks
Expected values do not need to be integer values, so leave them unrounded to avoid calculation errors!
What are degrees of freedom?
The number of degrees of freedom, 'dof', is equal to
the
e.g. dof for the contingency table above is
How do I use the chi-square distribution table?
You can use the chi-square tables given to you in the exam to find the critical value
This is the threshold value that determines whether you reject the null hypothesis or not
To find critical value from the tables, you need the significance level, and the dof
The critical value is located in the cell where the relevant row and column intersect
How do I conclude a hypothesis test?
Conclusions to a hypothesis test need to show two things:
a decision about the null hypothesis
an interpretation of this decision in the context of the question
To make the decision, compare the calculated goodness of fit value, , to the critical value from the table
If then the null hypothesis should be rejected
There is sufficient evidence to suggest that the proportions of the variable being tested are not the same across all groups
If then the null hypothesis should not be rejected
There is not sufficient evidence to suggest that the proportions of the variable being tested are not the same across all groups
How can I perform a homogeneity test on the calculator?
To complete a homogeneity test on your calculator:
Create a matrix of the observed values
Perform a chi-square test
This is often called a two-way test on a calculator
Compare your calculated , with the critical value from the chi-square tables
Alternatively, you can compare the given significance level, , with the calculator's -value
If using the -value, remember
, reject the null hypothesis
, do not reject the null hypothesis
Examiner Tips and Tricks
Even if you perform the homogeneity test on your calculator, it is still important to show all of your working to demonstrate full understanding. Depending on the question, you may need to show how the chi-square statistic is calculated in full or just how an expected value and the degrees of freedom are calculated.
If you compare the -value with , don't forget that the inequalities are the opposite to when you are comparing the value to the critical value when you are determining whether or not to reject the null hypothesis!
Worked Example
A political analyst wants to determine if voting preferences for three major political parties (Party S, Party M, and Party E) are consistent across three different cities (City X, City Y, and City Z) in a country. The analyst conducts a survey, randomly sampling 300 voters from each city. The results are shown in the table below.
Party S | Party M | Party E | Total | |
---|---|---|---|---|
City X | 122 | 97 | 81 | 300 |
City Y | 91 | 109 | 100 | 300 |
City Z | 137 | 84 | 79 | 300 |
Total | 350 | 290 | 260 | 900 |
A chi-square test is conducted at the 10% significance level. All conditions for inference are met. The test statistic is calculated as and the -value is 0.003.
What is the most appropriate conclusion to draw from this test?
(A) There is sufficient evidence that the proportions of voters for the political parties are the same for each city.
(B) There is sufficient evidence that the city a voter lives in is independent of their political party of choice.
(C) There is insufficient evidence that the city a voter lives in is independent of their political party of choice.
(D) This shows that the city you live in causes you to vote for a particular party.
(E) There is insufficient evidence that the proportions of voters for the political parties are the same for each city.
Answer:
This is a test for homogeneity, so the null hypothesis is that the proportions of voters for the different parties are the same in each city
The -value is less than the significance level, 0.003 < 0.1
Or, alternatively
The test statistic is greater than the critical value for ,
Therefore the null hypothesis should be rejected, so the proportions of voters for the different parties are not the same in each city
Consider option (A)
This is stating the null hypothesis, but the null hypothesis is meant to be rejected, so option A is incorrect
Consider option (B)
The appropriate chi-square test for this situation is a homogeneity test
This is a conclusions for an independence test, so option B is incorrect
Consider option (C)
The appropriate chi-square test for this situation is a homogeneity test
This is a conclusions for an independence test, so option C is incorrect
Consider option (D)
The test for homogeneity is not a test for cause and effect, so option D is incorrect
Consider option (E)
This is rejecting the null hypothesis, so option E is correct
Option E
Sign up now. It’s free!
Did this page help you?