Hypothesis Testing using the Chi-squared Distribution (DP IB Applications & Interpretation (AI))

Flashcards

1/34

0Still learning

Know0

  • What is a hypothesis test?

Enjoying Flashcards?
Tell us what you think

Cards in this collection (34)

  • What is a hypothesis test?

    A hypothesis test uses a sample of data in an experiment to test a statement made about the population.

    The statement is either about a population parameter or the distribution of the population.

  • Which hypothesis do you assume to be true when performing a hypothesis test: the null hypothesis or the alternative hypothesis?

    When performing a hypothesis test, you assume the null hypothesis to be true.

  • What is denoted by the notation straight H subscript 0?

    The null hypothesis is denoted by straight H subscript 0.

  • What is the notation for the alternative hypothesis?

    The notation for the alternative hypothesis is straight H subscript 1.

  • What is meant by the significance level of a hypothesis test?

    The significance level is the probability that a hypothesis test rejects the null hypothesis when it is true.

    The significance level sets the smallest probability that an event could have occurred if the null hypothesis were true. Any probability smaller than the significance level would suggest that the event is unlikely to have happened by chance.

  • An observation of the test statistic is taken for a hypothesis test.

    What is meant by the p-value of this observation?

    An observation of the test statistic is taken for a hypothesis test.

    The p-value is the probability of obtaining a value at least as extreme as the observation if the null hypothesis were true.

  • True or False?

    If the p-value is less than the significance level then there is evidence to reject the null hypothesis.

    True.

    If the p-value is less than the significance level then there is evidence to reject the null hypothesis.

  • What is meant by the critical region for a test statistic?

    The critical region is the set of values for a test statistic that would lead to the rejection of the null hypothesis.

    These are the values that are unlikely to be obtained if the null hypothesis were true.

  • What is meant by the critical value(s) for a test statistic?

    The critical value(s) is the boundary of the critical region.

    It is the least extreme value that would lead to the rejection of the null hypothesis.

  • What is a chi squared test for independence used for?

    A chi squared test for independence is used to test whether two variables are statistically independent of each other.

  • What are contingency tables in a chi squared test for independence?

    A contingency table is a two-way table that shows the observed frequencies for each combination of the two variables.

    For example:

    Eye colour

    Blue

    Brown

    Green

    Hair colour

    Black

    17

    12

    29

    Blonde

    31

    25

    21

  • True or False?

    If a contingency table has m rows and n columns then the number of degrees of freedom is equal to m×n.

    False.

    If a contingency table has m rows and n columns then the number of degrees of freedom is equal to (m-1)×(n-1).

  • True or False?

    For a chi squared test for independence, you reject the null hypothesis if the test statistic is greater than the critical value.

    True.

    For a chi squared test for independence, you reject the null hypothesis if the test statistic is greater than the critical value.

  • What is meant by the expected frequencies for a chi squared test for independence?

    The expected frequencies for a chi squared test for independence are the frequencies for each possible combination of outcomes of the two variables if they were independent.

  • How should you write the null hypothesis of a chi squared test for independence?

    For example, suppose you are testing whether hair colour and eye colour are independent.

    The null hypothesis of a chi squared test for independence should be of the form:

    straight H subscript 0 : variable X is independent of variable Y.

    For example, straight H subscript 0 : hair colour is independent of eye colour.

  • In an exam, how do you find the chi squared statistic for a chi squared test for independence?

    In an exam, to find the chi squared statistic for a chi squared test for independence you:

    • use the two-way test option on your GDC,

    • input the observed frequencies as a matrix,

    • run the test.

    Your GDC will give you the value of the chi squared statistic as well as the p-value and the expected frequencies.

  • If the null hypothesis is rejected for a chi squared test for independence, then what does this suggest about the two variables?

    If the null hypothesis is rejected for a chi squared test for independence, then this suggests that the two variables are not independent.

    However, this conclusion is not definitive as there will still be a small chance that they are independent.

  • If the null hypothesis is not rejected for a chi squared test for independence, then what does this suggest about the two variables?

    If the null hypothesis is not rejected for a chi squared test for independence, then there is insufficient evidence to suggest that the variables are not independent.

    Therefore, this suggests that the two variables could be independent.

    However, this conclusion is not definitive.

  • When using a chi-squared test, what number do the expected values need to be bigger than?

    When using a chi-squared test, the expected values need to be bigger than 5.

  • What do you need to do if an expected value is less than 5 in a contingency table?

    If an expected value is less than 5 in a contingency table, then you need to combine that row or column with the next row or column.

    You also need to combine the corresponding rows or columns in the contingency table for the observed values.

  • What is a chi squared goodness of fit test used for?

    A chi squared goodness of fit test is used to test whether data can be modelled by a specified distribution.

  • True or False?

    For a chi squared goodness of fit test, you reject the null hypothesis if the test statistic is less than the critical value.

    False.

    For a chi squared goodness of fit test, you reject the null hypothesis if the test statistic is greater than the critical value.

  • What is meant by the expected frequencies for a chi squared goodness of fit test?

    The expected frequencies for a chi squared goodness of fit test are the frequencies for each outcome if the data follows the specified distribution.

  • How do you find the expected frequencies for a chi squared goodness of fit test?

    To find the expected frequencies for a chi squared goodness of fit test, you:

    • find the probability of each outcome assuming the data follows the specified distribution,

    • multiply the probabilities by the total frequency.

  • In an exam, how do you find the chi squared statistic for a chi squared goodness of fit test?

    In an exam, to find the chi squared statistic for a chi squared goodness of fit test you:

    • use the goodness of fit option on your GDC,

    • input the observed frequencies as a list,

    • input the expected frequencies as a separate list,

    • enter the number of degrees of freedom,

    • run the test.

    Your GDC will give you the value of the chi squared statistic as well as the p-value.

  • How should you write the null hypothesis of a chi squared goodness of fit test?

    For example, suppose you are testing whether the number of eggs in a nest can be modelled by B(3, 0.1).

    The null hypothesis of a chi squared goodness of fit test should be of the form:

    straight H subscript 0 : variable X follows the distribution...(state the distribution)

    For example, straight H subscript 0 : the number of eggs in a nest follows the binomial distribution B(3, 0.1).

  • Suppose you are performing a chi squared goodness of fit test to test whether the following data can be modelled by X tilde straight N open parentheses 160 comma space 20 squared close parentheses.

    What three probabilities would you need to calculate?

    Height

    Frequency

    120 less or equal than h less than 150

    35

    150 less or equal than h less than 180

    45

    180 less or equal than h less than 200

    20

    Suppose you are performing a chi squared goodness of fit test to test whether the following data can be modelled by X tilde straight N open parentheses 160 comma space 20 squared close parentheses.

    You would need to calculate the following three probabilities.

    Height

    Probability

    120 less or equal than h less than 150

    straight P open parentheses X less than 150 close parentheses

    150 less or equal than h less than 180

    straight P open parentheses 150 less than X less than 180 close parentheses

    180 less or equal than h less than 200

    straight P open parentheses X greater than 180 close parentheses

  • What is the conclusion be if the null hypothesis is rejected for a chi squared goodness of fit test?

    If the null hypothesis is rejected for a chi squared goodness of fit test then there is sufficient evidence to suggest that the data does not follow the specified distribution.

  • What is the conclusion if the null hypothesis is not rejected for a chi squared goodness of fit test?

    If the null hypothesis is not rejected for a chi squared goodness of fit test then there is insufficient evidence to suggest that the data does not follow the specified distribution.

    Therefore, this suggests that the data does follow the specified distribution.

  • Given observed data for a goodness of fit test, how do you estimate the value of pfor a binomial distribution?

    Given observed data for a goodness of fit test, you can estimate the value of pfor a binomial distribution by finding the mean of the observed data and dividing by the number of outcomes for the binomial distribution.

    p equals fraction numerator x with bar on top over denominator n end fraction equals 1 over n cross times fraction numerator sum f x over denominator sum f end fraction.

    This formula is not given in your exam formula booklet.

  • Given observed data for a goodness of fit test, how do you estimate the value of mfor a Poisson distribution?

    Given observed data for a goodness of fit test, you can estimate the value of mfor a Poisson distribution by finding the mean of the observed data.

    m equals x with bar on top equals fraction numerator sum f x over denominator sum f end fraction.

    This formula is not given in your exam formula booklet.

  • True or False?

    For a goodness of fit test, the number of degrees of freedom is always k minus 1.

    False.

    For a goodness of fit test, the number of degrees of freedom is not always k minus 1.

    You also need to subtract an additional 1 for every parameter that had to be estimated.

  • straight H subscript 0 space colon thin space X can be modelled by a normal distribution.

    How would you find the number of degrees of freedom for the goodness of fit test described by the null hypothesis above?

    straight H subscript 0 space colon thin space X can be modelled by a normal distribution.

    The number of degrees of freedom for the goodness of fit test is k minus 3 where kis the number of classes after combining if necessary.

    The mean and variance will need to be estimated.

  • straight H subscript 0 space colon thin space X can be modelled by straight B open parentheses 3 comma space 0.4 close parentheses.

    The expected values for the hypothesis test are shown in the table.

    x

    0

    1

    2

    3

    Frequency

    8

    21

    18

    3

    What do you have to do with the expected values?

    You have to combine the final two rows as the last expected value is less than 5.

    x

    0

    1

    2 or more

    Frequency

    8

    21

    21