Hypothesis Tests for Differences in Population Means (College Board AP® Statistics)

Revision Note

Naomi C

Author

Naomi C

Expertise

Maths

Two-sample t-test for difference in population means

What is a two-sample t-test?

  • A two-sample t-test is used to test whether or not the population means of two different groups, mu subscript 1 and mu subscript 2 , are equal

    • You use a t-test when the population standard deviation, sigma, is unknown

      • This requires using the t-distribution, which is similar to the normal distribution

  • To try to prove your case, you take a recent random sample of size n from each of the populations and examine the difference between the sample means

    • e.g. you randomly sample 30 oak trees and 42 beech trees and examine the difference between the mean sample heights

What are the hypotheses for a two-sample t-test?

  • The null hypothesis, straight H subscript 0, is the assumption that there is no difference between the means of the two populations

    • e.g. straight H subscript 0 space colon The mean height of all oak trees is the same as the mean height of all beech trees, mu subscript 1 equals mu subscript 2

      • It is assumed to be correct, unless evidence proves otherwise

  • The alternative hypothesis, straight H subscript straight a, is how you think the population means might be different to each other

    • e.g. straight H subscript straight a colon The mean height of all oak trees is greater than the mean height of all beech trees, mu subscript 1 greater than mu subscript 2

    • Remember that a t-test could be one-tailed or two-tailed, this will affect your alternative hypothesis

Exam Tip

When writing out your hypotheses, always fully define the symbol used for the population parameters in context, e.g. '... where mu subscript 1 is the mean height of all oak trees and mu subscript 2 is the mean height of all beech trees'.

What are the conditions for a two-sample t-test?

  • When performing a two-sample t-test, you must show that it meets the following conditions:

    • Items in the sample (or experiment) must satisfy the independence condition

      • by verifying that data is collected by random sampling

      • or random assignment (in an experiment)

      • and, if sampling without replacement, showing that the sample size is less than 10% of the population size

    • The population should be approximately normally distributed

      • The distribution needs to be approximately symmetric

      • There should be no outliers

    • If the population is very skewed, you can only do a t-test if n greater or equal than 30

How do I calculate the standardized test statistic (t-value)?

  • The t-value, for the difference of means is given by:

    • t equals fraction numerator x with bar on top subscript 1 minus x with bar on top subscript 2 over denominator square root of s subscript 1 squared over n subscript 1 plus s subscript 2 squared over n subscript 2 end root end fraction

    • where x with bar on top subscript 1 and x with bar on top subscript 2 are the sample means, s subscript 1 and s subscript 2 are the sample standard deviations, and n subscript 1 and n subscript 2 are the sample sizes

Exam Tip

The formula for the standardized test statistic is given in the exam, fraction numerator statistic minus parameter over denominator standard space error space of space the space statistic end fraction, along with tables of parameters and standard errors.

You will need to apply this correctly to get the t-value. In the numerator, the statistic is the difference between the sample means and the parameter is the difference between the population means, which is 0 for the null hypothesis.

How do I calculate the p-value?

  • Work out the t-value

  • Find the appropriate number of degrees of freedom ('dof')

    • For a two-sample t-test, dof equals n minus 1

    • If the sample sizes are different, choose the smaller sample size for a more conservative value

  • Using the t-distribution table given to you:

    • find the row that corresponds to the dof

    • identify the t-value in the row that is closest to the calculated value

    • write down the value in the corresponding column header

      • this is the p-value

  • Note that the p-value from the t-table is for one tail

How do I conclude a hypothesis test?

  • Conclusions to a hypothesis test need to show two things:

    • a decision about the null hypothesis

    • an interpretation of this decision in the context of the question

  • To make the decision, compare the p-value to the significance level, alpha

    • If p less than alpha then the null hypothesis should be rejected

    • If p greater than alpha then the null hypothesis should not be rejected

  • In a two-tailed test, double the p-value and compare this to alpha

Exam Tip

Remember that the test should be interpreted within the context of the question.

Use the same language in your conclusion that is used in the problem, e.g. 'The data provides sufficient evidence that the mean height of all oak trees is greater than the mean height of all beech trees'.

What are the steps for performing a two-sample t-test on a calculator?

  • When using a calculator to conduct a two-sample t-test, you must still write down all steps of the hypothesis testing process:

    • State the null and alternative hypotheses and clearly define your parameters

    • Describe the test being used and show that the situation meets the conditions required

    • Calculate the t-value and the degrees of freedom

    • Calculate the p-value using your calculator

      • select a two-sample t-test and enter the relevant summary statistics or data to generate the p-value

      • select the unpooled option

    • Compare the p-value to the significance level

    • Write down the conclusion to the test and interpret it in the context of the problem

Exam Tip

Even if you perform the two-sample t-test on your calculator, it is still important to show all of your working to demonstrate full understanding. Therefore you should still show workings for calculating the t-value and the degrees of freedom.

Worked Example

Juan is a sophomore at SME High, whilst his friend attends APS High. Juan believes that, on average, sophomore students from SME High are faster than those from APS High. He takes a random sample of sophomore students from both schools and records the time they take to run 100 meters. The summary of the results is shown in the table below.

SME High

APS High

Mean

13.07

13.28

Standard deviation

0.46

0.32

Sample size

22

16

Perform an appropriate hypothesis test at the 10% significance level. Assume all conditions for inference are met. Is Juan's belief supported by the test?

Method 1: Using the t-table

State the type of test being used and verify the conditions for the test

The correct inference procedure is a two-sample t-test for the difference in population means with alpha equals 0.1

All conditions for inference are met as stated in the question

Define the population parameters, mu subscript 1 and mu subscript 2

Let mu subscript 1 be the mean time to run 100 m for all sophomore students at SME High

Let mu subscript 2 be the mean time to run 100 m for all sophomore students at APS High

Write the null and alternative hypotheses

This will be a one-tailed test as Juan believes that those from SME High will be faster than those from APS High

Note that because it is assumed that SME High will be faster, their mean time should be lower than the mean time for APS High

straight H subscript 0 space colon space mu subscript 1 minus mu subscript 2 equals 0
straight H subscript straight a space colon space mu subscript 1 minus mu subscript 2 less than 0

Calculate the standardized test statistic

table row t equals cell fraction numerator x with bar on top subscript 1 minus x with bar on top subscript 2 over denominator square root of s subscript 1 squared over n subscript 1 plus s subscript 2 squared over n subscript 2 end root end fraction end cell row blank equals cell fraction numerator 13.07 minus 13.28 over denominator square root of fraction numerator 0.46 squared over denominator 22 end fraction plus fraction numerator 0.32 squared over denominator 16 end fraction end root end fraction end cell row blank equals cell negative 1.659253... end cell end table

Choose the smallest sample and calculate the number of degrees of freedom from it

degrees of freedom = 16 - 1 = 15

Find the p-value from the t-tables

Find the row corresponding to 15 degrees of freedom and identify the t-value that is closest to the absolute calculated t-value (1.659)

closest t-value = 1.753

corresponding p-value is p equals 0.05

Compare the probability to the significance level and state the conclusion of the test

table row cell 0.05 end cell less than cell 0.1 end cell row p less than alpha end table

straight H subscript 0 is rejected

Interpret the result in the context of the question

We have sufficient evidence to support Juan's belief that sophomore students from SME High are, on average, faster than sophomore students from APS High

Method 2: Using a calculator

State the type of test being used and verify the conditions for the test

The correct inference procedure is a two-sample t-test for the difference in population means with alpha equals 0.1

All conditions for inference are met as stated in the question

Define the population parameters, mu subscript 1 and mu subscript 2

Let mu subscript 1 be the mean time to run 100 m for all sophomore students at SME High

Let mu subscript 2 be the mean time to run 100 m for all sophomore students at APS High

Write the null and alternative hypotheses

This will be a one-tailed test as a Juan believes that those from SME High will be faster than those from APS High

Note that because it is assumed that SME High will be faster, their mean time should be lower than the mean time for APS High

straight H subscript 0 space colon space mu subscript 1 minus mu subscript 2 equals 0
straight H subscript straight a space colon space mu subscript 1 minus mu subscript 2 less than 0

Calculate the standardized test statistic

table row t equals cell fraction numerator x subscript 1 minus x subscript 2 over denominator square root of s subscript 1 squared over n subscript 1 plus s subscript 2 squared over n subscript 2 end root end fraction end cell row blank equals cell fraction numerator 13.07 minus 13.28 over denominator square root of fraction numerator 0.46 squared over denominator 22 end fraction plus fraction numerator 0.32 squared over denominator 16 end fraction end root end fraction end cell row blank equals cell negative 1.659253... end cell end table

Choose the smallest sample and calculate the number of degrees of freedom from it

degrees of freedom = 16 - 1 = 15

Write down the parameters for the t-test

x with bar on top subscript 1 equals 13.07
s subscript 1 equals 0.46
n subscript 1 equals 22 x with bar on top subscript 2 equals 13.28
s subscript 2 equals 0.32
n subscript 2 equals 16

Enter these into your calculator along with the correct alternative hypothesis and calculate the p-value

p equals 0.052881...

Compare the probability to the significance level and state the conclusion of the test

table row cell 0.052881... end cell less than cell 0.1 end cell row p less than alpha end table

straight H subscript 0 is rejected

Interpret the result in the context of the question

We have sufficient evidence to support Juan's belief that sophomore students from SME High are, on average, faster than sophomore students from APS High

You've read 0 of your 10 free revision notes

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Naomi C

Author: Naomi C

Naomi graduated from Durham University in 2007 with a Masters degree in Civil Engineering. She has taught Mathematics in the UK, Malaysia and Switzerland covering GCSE, IGCSE, A-Level and IB. She particularly enjoys applying Mathematics to real life and endeavours to bring creativity to the content she creates.