Hypothesis Tests for Differences in Population Proportions (College Board AP® Statistics): Study Guide

Written by: Mark Curtis

Reviewed by: Dan Finlay

Updated on 17 September 2024

Two-sample z-test for difference in population proportions

What is a two-sample z-test for a difference in population proportions?

A two sample z-test is used to test whether or not the population proportions of two independent populations, $p_{1}$ and $p_{2}$ , are equal
- One random sample of size $n_{1}$ is taken from the first population
- A different random sample of size $n_{2}$ is taken the second population
- The sample proportions are ${\hat{p}}_{1}$ and ${\hat{p}}_{2}$
  - The difference in sample proportions is ${\hat{p}}_{1} - {\hat{p}}_{2}$

What are the hypotheses?

The null hypothesis, $H_{0}$ , is the assumption that there is no difference between the population proportions
- e.g. $H_{0} :$ The proportion of left-handed students at both schools is equal, $p_{1} = p_{2}$
  - It is assumed to be correct, unless evidence proves otherwise
  - It can also be written as $p_{1} - p_{2} = 0$
The alternative hypothesis, $H_{a}$ , is how you think the population proportions might be different to each other
- e.g. $H_{a} :$ The proportion of left-handed students in School A is greater than in School B, $p_{1} > p_{2}$
- Remember that a z-test could be one-tailed or two-tailed ( $p_{1} \neq p_{2}$ )

Examiner Tips and Tricks

When writing out your hypotheses, always fully define the symbol used for the population parameters in context, e.g. '... where $p_{1}$ is the proportion of left-handed students in School A and $p_{2}$ is the proportion of left-handed students in School B'.

What are the conditions required?

When performing a two-sample z-test for a difference in population proportions, you must show that it meets the following conditions:
- Items in the two samples (or experiment) must satisfy the independence condition
  - by verifying that data is collected by random sampling
  - or random assignment (in an experiment)
  - and, if sampling without replacement, showing that both sample sizes are less than 10% of their population size
- The sampling distribution of ${\hat{p}}_{1} - {\hat{p}}_{2}$ must be approximately normal
  - by first calculating the combined proportion, ${\hat{p}}_{c}$ , given by ${\hat{p}}_{c} = \frac{X_{1} + X_{2}}{n_{1} + n_{2}}$ (which assumes the null hypothesis, $p_{1} = p_{2}$ ) where $X_{1} = n_{1} {\hat{p}}_{1}$ (the number of successes in the first sample) and $X_{2} = n_{2} {\hat{p}}_{2}$ (the number of successes in the second sample)
  - then using ${\hat{p}}_{c}$ to verify that $n_{1} {\hat{p}}_{c} \geq 10$ , $n_{1} (1 - {\hat{p}}_{c}) \geq 10$ , $n_{2} {\hat{p}}_{c} \geq 10$ and $n_{2} (1 - {\hat{p}}_{c}) \geq 10$
- The combined proportion, ${\hat{p}}_{c}$ , is also called the pooled proportion
  - It can only be used when $p_{1} = p_{2}$ is assumed (like under the null hypothesis)

Examiner Tips and Tricks

The formula for the combined proportion, ${\hat{p}}_{c} = \frac{X_{1} + X_{2}}{n_{1} + n_{2}}$ , is given in the exam, but you need to learn that $X_{1} = n_{1} {\hat{p}}_{1}$ (the number of successes in the first sample) and $X_{2} = n_{2} {\hat{p}}_{2}$ (the number of successes in the second sample).

Examiner Tips and Tricks

Some exam questions may change the four $\geq 10$ conditions into four $\geq 5$ conditions (changing the 10 into a 5), though this will be made clear in the question.

How do I calculate the standardized test statistic?

The standardized test statistic for a difference in sample proportions is a z-score given by:
- $z = \frac{({\hat{p}}_{1} - {\hat{p}}_{2}) - 0}{\sqrt{{\hat{p}}_{c} (1 - {\hat{p}}_{c}) (\frac{1}{n_{1}} + \frac{1}{n_{2}})}}$
  - where ${\hat{p}}_{1}$ and ${\hat{p}}_{2}$ are the sample proportions
  - $n_{1}$ and $n_{2}$ are the sample sizes
  - ${\hat{p}}_{c}$ is the combined proportion given by ${\hat{p}}_{c} = \frac{X_{1} + X_{2}}{n_{1} + n_{2}}$ where $X_{1} = n_{1} {\hat{p}}_{1}$ and $X_{2} = n_{2} {\hat{p}}_{2}$
  - and the zero, 0, highlights that the difference in population proportions is zero under the null hypothesis, $p_{1} - p_{2} = 0$

Examiner Tips and Tricks

The formula for the standardized test statistic is given in the exam, $\frac{statistic - parameter}{standard error of the statistic}$ , along with tables of parameters and standard errors.

There are two different standard errors for population proportion given in the exam. For hypothesis testing, you need the second one where it says ' $p_{1} = p_{2}$ is assumed'!

How do I calculate the p-value?

The p-value is the probability of obtaining a test statistic as extreme, or more extreme, than the one observed in the difference of the two samples, assuming the null hypothesis is true
Use the standard normal distribution, $Z$ , to calculate the probability of being in the extreme region (tail) that extends from the z-score given by the formula above
- You can use either the z-tables or a calculator to find this probability
For a two-tail test, remember to work out the total probability across both tails
- You can double the p-value from a one-tail test

How do I conclude a hypothesis test?

Conclusions to a hypothesis test need to show two things:
- a decision about the null hypothesis
- an interpretation of this decision in the context of the question
To make the decision, compare the p-value to the significance level, $α$
- If $p < α$ then the null hypothesis should be rejected
- If $p > α$ then the null hypothesis should not be rejected
In a two-tailed test, double the p-value and compare this to $α$

Examiner Tips and Tricks

Remember that the test should be interpreted within the context of the question.

Use the same language in your conclusion that is used in the problem, e.g. 'The data provides sufficient evidence that the proportion of left-handed students in School A is greater than the proportion of left-handed students in School B'.

What are the steps on a calculator?

When using a calculator to conduct a z-test for a difference in population proportions, you must still write down all steps of the hypothesis testing process:
- State the null and alternative hypotheses and clearly define your parameter
- Describe the test being used and show that the situation meets the conditions required
- Calculate the standardized test statistic (z-score)
- Calculate the p-value using your calculator
- Compare the p-value to the significance level
- Write down the conclusion to the test and interpret it in the context of the problem

Examiner Tips and Tricks

Even if you perform a z-test for a difference in population proportions on your calculator, it is still important to show all of your working to demonstrate full understanding, including calculating the z-score.

Worked Example

Nova University and Terra University have over 10,000 students each. A random sample of 200 students at Nova University and a random sample of 150 students from Terra University were asked to complete a survey to measure their level of smartphone addiction. The results showed that 35% of the students sampled from Nova University were addicted to their smartphones, while 28% of the students sampled from Terra University were addicted to their smartphones.

Is there sufficient evidence, at a 0.05 level of significance, to conclude that there is a difference in the proportion of students addicted to smartphones at Nova University and Terra University?

Answer:

State the type of test being used and verify the conditions for the test

The correct inference procedure is a two-sample z-test for the difference in population proportions with $α = 0.05$

The independence condition is satisfied, as
- both samples were selected randomly
- the sample size from Nova University, 200, is less than 10% of the total number of students at Nova University (10% of 'over 10,000' is 'over 1000')
- the sample size from Terra University, 150, is less than 10% of the total number of students at Terra University (10% of 'over 10,000' is 'over 1000')
  - These conditions are required as sampling was conducted without replacement
The sample size is large enough for the sampling distribution of the difference in sample proportions to be approximately normally distributed, because
- the combined proportion is ${\hat{p}}_{c} = \frac{X_{1} + X_{2}}{n_{1} + n_{2}}$ where $X_{1} = n_{1} {\hat{p}}_{1}$ and $X_{2} = n_{2} {\hat{p}}_{2}$
  - giving ${\hat{p}}_{c} = \frac{200 \cdot 0.35 + 150 \cdot 0.28}{200 + 150} = 0.32$
- and the following conditions are satisfied
  - $n_{1} {\hat{p}}_{c} = 200 \cdot 0.32 = 64 \geq 10$
  - $n_{1} (1 - {\hat{p}}_{c}) = 200 \cdot (1 - 0.32) = 136 \geq 10$
  - $n_{2} {\hat{p}}_{c} = 150 \cdot 0.32 = 48 \geq 10$
  - $n_{2} (1 - {\hat{p}}_{c}) = 150 \cdot (1 - 0.32) = 102 \geq 10$

Define the population parameters, $p_{1}$ and $p_{2}$

Let $p_{1}$ be the proportion of all students at Nova University who are addicted to their smartphones

Let $p_{2}$ be the proportion of all students at Terra University who are addicted to their smartphones

Write the null and alternative hypotheses

This will be a two-tailed test as a difference is assumed, but no direction is specified

$H_{0} : p_{1} = p_{2} H_{a} : p_{1} \neq p_{2}$

Calculate the standardized test statistic

$\begin{array}{rcl} z & = & \frac{({\hat{p}}_{1} - {\hat{p}}_{2}) - 0}{\sqrt{{\hat{p}}_{c} (1 - {\hat{p}}_{c}) (\frac{1}{n_{1}} + \frac{1}{n_{2}})}} \\ = & \frac{(0.35 - 0.28) - 0}{\sqrt{0.32 (1 - 0.32) (\frac{1}{200} + \frac{1}{150})}} \\ = & 1.389297 . . . \end{array}$

Find the p-value for one of the tails, $P (Z > 1.389297 . . .)$ , e.g. from the z-tables

$1 - 0.9177 = 0.0823$

Double this probability to find the p-value for both tails

$p = 0.0823 \times 2 = 0.1646$

Compare this probability to the significance level and state the conclusion of the test

$\begin{array}{rcl} 0.1646 & > & 0.05 \\ p & > & α \end{array}$

$H_{0}$ is not rejected

Interpret this result in the context of the question

There is not sufficient evidence to conclude that there is a difference in the proportion of students addicted to smartphones at Nova University and Terra University

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Test yourself

Was this revision note helpful?

Previous:Confidence Intervals for Population ProportionsNext:Confidence Intervals for Differences in Population Proportions

Hypothesis Tests for Differences in Population Proportions (College Board AP® Statistics): Study Guide

Two-sample z-test for difference in population proportions

What is a two-sample z-test for a difference in population proportions?

What are the hypotheses?

What are the conditions required?

How do I calculate the standardized test statistic?

How do I calculate the p-value?

How do I conclude a hypothesis test?

What are the steps on a calculator?

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

Unit 1: Exploring One-Variable Data

Summary Statistics

Describing Variables

Parameters & Statistics

Measures of Center

Measures of Position

Measures of Variability

Tables & Relative Frequency

Grouped Data

Outliers & Resistant Measures

Five-Number Summary & Boxplots

Skewness of Data

Comparing Data using Summary Statistics

Graphical Representations

Shape of Distributions

Bar Charts & Histograms

Dotplots & Stemplots

Cumulative Graphs

Comparing Univariate Graphs

The Normal Distribution

Properties of Normal Distributions

Standardized z-scores

Comparing Normal Distributions

Finding Proportions from Normal Distributions

Inverse Normal Calculations

Estimating Parameters of Normal Distributions

Unit 2: Exploring Two-Variable Data

Tables & Graphs

Two-Way Tables & Relative Frequencies

Bar Graphs & Mosaic Plots

Scatterplots & Regression

Explanatory & Response Variables

Scatterplots

Association & Correlation Coefficients

Interpolation & Extrapolation using Linear Models

Residuals

The Least-Squares Regression Line

Residual Plots

The Coefficient of Determination

Outliers, High-Leverage & Influential Points

Linearization of Bivariate Data

Unit 3: Collecting Data

Sampling Methods & Bias

Introduction to Sampling

Simple Random Sampling (SRS)

Random Sampling Methods

Types of Bias

Non-random (Biased) Sampling Methods

Experimental Design

Introduction to Experiments

Well-Designed Experiments

Control Groups, Placebos & Blind Experiments

Completely Randomized Design

Randomized Block & Matched Pairs Design

Unit 4: Probability, Random Variables & Probability Distributions

Probability

Estimating Probability using Relative Frequency

Probabilities of Single Events

Introduction to Combined Events

Addition Rule & Mutually Exclusive Events

Conditional Probability

Multiplication Rule & Independent Events

Probabilities of Combined Events using Tree Diagrams

Probabilities of Combined Events using the Rules

Discrete Random Variables

Probability Distributions for Discrete Random Variables

Cumulative Probability Distributions for Discrete Random Variables

Mean & Standard Deviation of a Discrete Random Variable

Linear Transformations of Random Variables