Goodness of Fit Test (DP IB Maths: AI HL)

Revision Note

Dan

Author

Dan

Last updated

Did this video help you?

Chi-Squared GOF: Uniform

What is a chi-squared goodness of fit test for a given distribution?

  • A chi-squared (chi squared) goodness of fit test is used to test data from a sample which suggests that the population has a given distribution
  • This could be that: 
    • the proportions of the population for different categories follows a given ratio 
    • the population follows a uniform distribution
      • This means all outcomes are equally likely

What are the steps for a chi-squared goodness of fit test for a given distribution?

  • STEP 1: Write the hypotheses
    • H0 : Variable X  can be modelled by the given distribution
    • H1 : Variable X  cannot be modelled by the given distribution
      • Make sure you clearly write what the variable is and don’t just call it X
  • STEP 2: Calculate the expected frequencies
    • Split the total frequency using the given ratio
    • For a uniform distribution: divide the total frequency N  by the number of possible outcomes k
  • STEP 3: Calculate the degrees of freedom for the test
    • For k  possible outcomes
    • Degrees of freedom is nu equals k minus 1
  • STEP 4: Enter the frequencies and the degrees of freedom into your GDC
    • Enter the observed and expected frequencies as two separate lists
    • Your GDC will then give you the χ² statistic and its p-value
    • The χ² statistic is denoted as chi subscript c a l c end subscript superscript 2
  • STEP 5: Decide whether there is evidence to reject the null hypothesis
    • EITHER compare the χ² statistic with the given critical value
      • If χ² statistic > critical value then reject H0
      • If χ² statistic < critical value then accept H0
    • OR compare the p-value with the given significance level
      • If p-value < significance level then reject H0
      • If p-value > significance level then accept H0
  • STEP 6: Write your conclusion
    • If you reject H0
      • There is sufficient evidence to suggest that variable X does not follow the given distribution
      • Therefore this suggests that the data is not distributed as claimed
    •  If you accept H0
      • There is insufficient evidence to suggest that variable X does not follow the given distribution
      • Therefore this suggests that the data is distributed as claimed

Worked example

A car salesman is interested in how his sales are distributed and records his sales results over a period of six weeks. The data is shown in the table.

Week

1

2

3

4

5

6

Number of sales

15

17

11

21

14

12

chi squared goodness of fit test is to be performed on the data at the 5% significance level to find out whether the data fits a uniform distribution.

a)
Find the expected frequency of sales for each week if the data were uniformly distributed.

4-7-3-ib-ai-sl-gof-uniform-a-we-solution

b)
Write down the null and alternative hypotheses.

4-7-3-ib-ai-sl-gof-uniform-b-we-solution

c)
Write down the number of degrees of freedom for this test.

4-7-3-ib-ai-sl-gof-uniform-c-we-solution

d)
Calculate the p-value.

4-7-3-ib-ai-sl-gof-uniform-d-we-solution

e)
State the conclusion of the test. Give a reason for your answer.

4-7-3-ib-ai-sl-gof-uniform-e-we-solution

Did this video help you?

Chi-Squared GOF: Binomial

What is a chi-squared goodness of fit test for a binomial distribution?

  • A chi-squared (chi squared) goodness of fit test is used to test data from a sample suggesting that the population has a binomial distribution
    • You will either be given a precise binomial distribution to test straight B left parenthesis n comma space p right parenthesis with an assumed value for p
    • Or you will be asked to test whether a binomial distribution is suitable without being given an assumed value for p 
      • In this case you will have to calculate an estimate for the value of p for the binomial distribution
      • To calculate it divide the mean by the value of n
      • p equals fraction numerator x with bar on top over denominator n end fraction equals 1 over n cross times fraction numerator sum f x over denominator sum f end fraction

What are the steps for a chi-squared goodness of fit test for a binomial distribution?

  • STEP 1: Write the hypotheses
    • H0 : Variable X can be modelled by a binomial distribution
    • H1 : Variable X cannot be modelled by a binomial distribution
      • Make sure you clearly write what the variable is and don’t just call it X
      • If you are given the assumed value of p then state the precise distribution straight B left parenthesis n comma space p right parenthesis
  • STEP 2: Calculate the expected frequencies
    • If you were not given the assumed value of p then you will first have to estimate it using the observed data
    • Find the probability of the outcome using the binomial distribution straight P left parenthesis X equals x right parenthesis
    • Multiply the probability by the total frequency straight P left parenthesis X equals x right parenthesis cross times N
    • You will have to combine rows/columns if any expected values are 5 or less
  • STEP 3: Calculate the degrees of freedom for the test
    • For k outcomes (after combining expected values if needed)
    • Degrees of freedom is 
      • nu equals k minus 1 if you were given the assumed value of p
      • nu equals k minus 2 if you had to estimate the value of p
  • STEP 4: Enter the frequencies and the degrees of freedom into your GDC
    • Enter the observed and expected frequencies as two separate lists
    • Your GDC will then give you the χ² statistic and its p-value
    • The χ² statistic is denoted as chi subscript c a l c end subscript superscript 2
  • STEP 5: Decide whether there is evidence to reject the null hypothesis
    • EITHER compare the χ² statistic with the given critical value
      • If χ² statistic > critical value then reject H0
      • If χ² statistic < critical value then accept H0
    • OR compare the p-value with the given significance level
      • If p-value < significance level then reject H0
      • If p-value > significance level then accept H0
  • STEP 6: Write your conclusion
    • If you reject H0
      • There is sufficient evidence to suggest that variable X does not follow the binomial distribution straight B left parenthesis n comma space p right parenthesis
      • Therefore this suggests that the data does not follow straight B left parenthesis n comma space p right parenthesis
    • If you accept H0
      • There is insufficient evidence to suggest that variable X does not follow the binomial distribution straight B left parenthesis n comma space p right parenthesis
      • Therefore this suggests that the data follows straight B left parenthesis n comma space p right parenthesis

Worked example

A stage in a video game has three boss battles. 1000 people try this stage of the video game and the number of bosses defeated by each player is recorded.

Number of bosses defeated

0

1

2

3

Frequency

490

384

111

15

chi squared goodness of fit test at the 5% significance level is used to decide whether the number of bosses defeated can be modelled by a binomial distribution with a 20% probability of success.

a)
State the null and alternative hypotheses.

4-7-3-ib-ai-sl-gof-binomial-a-we-solution

b)
Assuming the binomial distribution holds, find the expected number of people that would defeat exactly one boss.

t9ph9q9z_4-7-3-ib-ai-sl-gof-binomial-b-we-solution

c)
Calculate the p-value for the test.

3sGACCT3_4-7-3-ib-ai-sl-gof-binomial-c-we-solution

d)
State the conclusion of the test. Give a reason for your answer.opxxE5_K_4-7-3-ib-ai-sl-gof-binomial-d-we-solution

Did this video help you?

Chi-Squared GOF: Normal

What is a chi-squared goodness of fit test for a normal distribution?

  • A chi-squared (chi squared) goodness of fit test is used to test data from a sample suggesting that the population has a normal distribution
    • You will either be given a precise normal distribution to test straight N left parenthesis mu comma space sigma squared right parenthesis with assumed values for μ and σ
    • Or you will be asked to test whether a normal distribution is suitable without being given assumed values for μ and/or σ 
      • In this case you will have to calculate an estimate for the value of μ and/or σ for the normal distribution
      • Either use your GDC or use the formulae
      • x with bar on top equals fraction numerator sum f x over denominator sum f end fraction and s subscript n minus 1 end subscript superscript 2 equals fraction numerator n over denominator n minus 1 end fraction s subscript n superscript 2

What are the steps for a chi-squared goodness of fit test for a normal distribution?

·     STEP 1: Write the hypotheses

    • H0 : Variable X can be modelled by a normal distribution
    • H1 : Variable X cannot be modelled by a normal distribution 
      • Make sure you clearly write what the variable is and don’t just call it X
      • If you are given the assumed values of μ and σ then state the precise distribution straight N left parenthesis mu comma space sigma squared right parenthesis

  • STEP 2: Calculate the expected frequencies
    • If you were not given the assumed values of μ or σ then you will first have to estimate them
    • Find the probability of the outcome using the normal distribution straight P left parenthesis a less than X less than b right parenthesis 
      • Beware of unbounded inequalities straight P left parenthesis X less than b right parenthesis or straight P left parenthesis X greater than a right parenthesis for the class intervals on the 'ends'
    •  Multiply the probability by the total frequency straight P left parenthesis a less than X less than b right parenthesis cross times N
    • You will have to combine rows/columns if any expected values are 5 or less
  • STEP 3: Calculate the degrees of freedom for the test
    • For k class intervals (after combining expected values if needed)
    • Degrees of freedom is
      • nu equals k minus 1 if you were given the assumed values for both μ and σ 
      • nu equals k minus 2 if you had to estimate either μ or σ but not both
      • nu equals k minus 3 if you had to estimate both μ and σ
  •  STEP 4: Enter the frequencies and the degrees of freedom into your GDC
    • Enter the observed and expected frequencies as two separate lists
    • Your GDC will then give you the χ² statistic and its p-value
    • The χ² statistic is denoted as chi subscript c a l c end subscript superscript 2
  • STEP 5: Decide whether there is evidence to reject the null hypothesis
    • EITHER compare the χ² statistic with the given critical value
      • If χ² statistic > critical value then reject H0
      • If χ² statistic < critical value then accept H0
    • OR compare the p-value with the given significance level
      • If p-value < significance level then reject H0
      • If p-value > significance level then accept H0
  •  STEP 6: Write your conclusion
    •  If you reject H0
      • There is sufficient evidence to suggest that variable X does not follow the normal distribution straight N left parenthesis mu comma space sigma squared right parenthesis
      • Therefore this suggests that the data does not follow straight N left parenthesis mu comma space sigma squared right parenthesis
    • If you accept H0
      •  There is insufficient evidence to suggest that variable X does not follow the normal distribution straight N left parenthesis mu comma space sigma squared right parenthesis
      •  Therefore this suggests that the data follows straight N left parenthesis mu comma space sigma squared right parenthesis

Worked example

300 marbled ducks in Quacktown are weighed and the results are shown in the table below.

Mass (g)

Frequency

m less than 450 1

450 less or equal than m less than 470 9

470 less or equal than m less than 520

158

520 less or equal than m less than 570

123

m greater or equal than 570

9

chi squared goodness of fit test at the 10% significance level is used to decide whether the mass of a marbled duck can be modelled by a normal distribution with mean 520 g and standard deviation 30 g.

a)
Explain why it is necessary to combine the groups m less than 450 and 450 less or equal than m less than 470 to create the group m less than 470 with frequency 10.

4-11-3-ib-ai-hl-gof-normal-a-we-solution

b)
Calculate the expected frequencies, giving your answers correct to 2 decimal places.

4-7-3-ib-ai-sl-gof-normal-a-we-solution

c)
Write down the null and alternative hypotheses.

4-7-3-ib-ai-sl-gof-normal-b-we-solution

d)
Calculate the chi squared statistic.

4-7-3-ib-ai-sl-gof-normal-c-we-solution

e)
Given that the critical value is 6.251, state the conclusion of the test. Give a reason for your answer.

4-7-3-ib-ai-sl-gof-normal-d-we-solution

Chi-squared GOF: Poisson

What is a chi-squared goodness of fit test for a Poisson distribution?

  • A chi-squared (χ²) goodness of fit test is used to test data from a sample suggesting that the population has a Poisson distribution
    • You will either be given a precise Poisson distribution to test Po left parenthesis m right parenthesis with an assumed value for m
    • Or you will be asked to test whether a Poisson distribution is suitable without being given an assumed value for m 
      • In this case you will have to calculate an estimate for the value of m for the Poisson distribution
      • To calculate it just calculate the mean
      • m equals fraction numerator sum f x over denominator sum f end fraction

What are the steps for a chi-squared goodness of fit test for a Poisson distribution?

  • STEP 1: Write the hypotheses
    • H0 : Variable X can be modelled by a Poisson distribution
    • H1 : Variable X cannot be modelled by a Poisson distribution
      • Make sure you clearly write what the variable is and don’t just call it X
      • If you are given the assumed value of m then state the precise distribution Po left parenthesis m right parenthesis
  • STEP 2: Calculate the expected frequencies
    • If you were not given the assumed value of m then you will first have to estimate it using the observed data
    • Find the probability of the outcome using the Poisson distribution straight P left parenthesis X equals x right parenthesis
    • Multiply the probability by the total frequency straight P left parenthesis X equals x right parenthesis cross times N
      • If a is the smallest observed value then calculate straight P left parenthesis X less or equal than a right parenthesis
      • If b is the largest observed value then calculate straight P left parenthesis X greater or equal than b right parenthesis
    • You will have to combine rows/columns if any expected values are 5 or less
  • STEP 3: Calculate the degrees of freedom for the test
    • For k outcomes (after combining expected values if needed)
    • Degree of freedom is 
      • nu equals k minus 1 if you were given the assumed value of m
      • nu equals k minus 2 if you had to estimate the value of m
  • STEP 4: Enter the frequencies and the degree of freedom into your GDC
    • Enter the observed and expected frequencies as two separate lists
    • Your GDC will then give you the χ² statistic and its p-value
    • The χ² statistic is denoted as chi subscript c a l c end subscript superscript 2
  • STEP 5: Decide whether there is evidence to reject the null hypothesis
    • EITHER compare the χ² statistic with the given critical value
      • If χ² statistic > critical value then reject H0
      • If χ² statistic < critical value then accept H0
    • OR compare the p-value with the given significance level
      • If p-value < significance level then reject H0
      • If p-value > significance level then accept H0
  • STEP 6: Write your conclusion
    • If you reject H0
      • There is sufficient evidence to suggest that variable X does not follow the Poisson distribution Po left parenthesis m right parenthesis
      • Therefore this suggests that the data does not follow Po left parenthesis m right parenthesis
    • If you accept H0
      • There is insufficient evidence to suggest that variable X does not follow the Poisson distribution Po left parenthesis m right parenthesis
      • Therefore this suggests that the data follows Po left parenthesis m right parenthesis

Worked example

A parent claims the number of messages they receive from their teenage child within an hour can be modelled by a Poisson distribution. The parent collects data from 100 one hour periods and records the observed frequencies of the messages received from the child. The parent calculates the mean number of messages received from the sample and uses this to calculate the expected frequencies if a Poisson model is used.

Number of messages

Observed frequency

Expected frequency

0

9

7.28

1

16

a

2

23

24.99

3

22

21.82

4

16

14.29

5

14

7.49

6 or more

0

b

A χ² goodness of fit test at the 10% significance level is used to test the parent’s claim.

a)
Write down null and alternative hypotheses to test the parent’s claim.

4-11-3-ib-ai-hl-gof-poisson-a-we-solution

b)
Show that the mean number of messages received per hour for the sample is 2.62.

4-11-3-ib-ai-hl-gof-poisson-b-we-solution

c)
Calculate the values of a and b, giving your answers to 2 decimal places.

4-11-3-ib-ai-hl-gof-poisson-c-we-solution

d)
Perform the hypothesis test.

4-11-3-ib-ai-hl-gof-poisson-d-we-solution

You've read 0 of your 10 free revision notes

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Dan

Author: Dan

Expertise: Maths

Dan graduated from the University of Oxford with a First class degree in mathematics. As well as teaching maths for over 8 years, Dan has marked a range of exams for Edexcel, tutored students and taught A Level Accounting. Dan has a keen interest in statistics and probability and their real-life applications.