Chi Squared Tests for Standard Distributions (Edexcel A Level Further Maths: Further Statistics 1)

Revision Note

Roger

Author

Roger

Last updated

Chi Squared for Discrete Uniform

How do I do a chi-squared test with a discrete uniform distribution?

  • A chi-squared (chi squared) goodness of fit test can be used to test data from a sample which suggests that the population has a discrete uniform distribution
  • For a random variable X with the discrete uniform distribution
    • X can take a finite number kof distinct values
    • each value is equally likely
      • straight P open parentheses X equals x close parentheses equals 1 over k comma space space space x equals 1 comma space 2 comma space... comma space k
  • There will never be any parameters to estimate for a discrete uniform goodness of fit test

What are the steps?

  • STEP 1: Write the hypotheses
    • straight H subscript 0 : A discrete uniform distribution is a suitable model for Variable X
    • straight H subscript 1 : A discrete uniform distribution is not a suitable model for Variable X 
      • The hypotheses should always be stated in the context of the question
      • Make sure you clearly write what the variable is and don’t just call it 'Variable X'
  • STEP 2: Calculate the expected frequencies
    • each expected frequency is the same
    • divide the total frequency N by the number of possible outcomes k
  • STEP 3: Calculate the degrees of freedom for the test
    • For k  possible outcomes
    • degrees of freedom is nu equals k minus 1
  • STEP 4: Calculate X squared using either version of the formula

X squared equals stack sum space with i equals 1 below and n on top open parentheses O subscript i minus E subscript i close parentheses squared over E subscript i equals open parentheses stack sum space with i equals 1 below and n on top O subscript i squared over E subscript i close parentheses minus N 

    • Determine the appropriate chi squared critical value
      • chi subscript nu superscript 2 open parentheses alpha percent sign close parentheses is the critical value with nu degrees of freedom for significance level alpha 
      • use the 'Percentage Points of the chi squared Distribution' table in the exam formula booklet
    • Or, alternatively, use a calculator to find the chi subscript nu superscript 2 p-value
      • This is the probability of obtaining a chi-squared value of X squared or more
  • STEP 5: Decide whether there is evidence to reject the null hypothesis
    • Compare the statistic with the critical value you have determined
      • If X squared > critical value (or p less than alpha) then there is sufficient evidence to reject bold H subscript bold 0
      • If X squared < critical value (or p greater than alpha) then there is insufficient evidence to reject bold H subscript bold 0
  • STEP 6: Write your conclusion
    • If you reject H0
      • A discrete uniform distribution is not a suitable model
    •  If you do not reject H0
      • A discrete uniform distribution is a suitable model
    • Be sure to state your conclusion in the context of the question

Worked example

A car salesperson is interested in how her sales are distributed and records her sales results over a period of six weeks. The data is shown in the table.

Week

1

2

3

4

5

6

Number of sales

15

17

11

21

14

12

Test, at the 5% significance level, whether or not the observed frequencies could be modelled by a discrete uniform distribution.

uniform-chi-squared-test-we-1uniform-chi-squared-test-we-2

Chi Squared for Binomial

How do I do a chi-squared test with a binomial distribution?

  • A chi-squared (chi squared) goodness of fit test can be used to test data from a sample suggesting that the population has a binomial distribution
  • For a random variable X to have a binomial distribution:
    • the number of trials (n) must be fixed in each observation
    • the trials must be independent
    • each trial can have only two outcomes (success and failure)
    • the probability of success (p) must be constant
  •  A question may give a precise binomial distribution straight B open parentheses n comma space p close parentheses to test 
    • with an assumed value for p
  •  Or you may be asked to test whether a binomial distribution is suitable without being given an assumed value for p 
    • In this case you will have to calculate an estimate for the value of p for the binomial distribution
    • For N observations of the variable

p equals fraction numerator total space number space of space successes over denominator number space of space trials cross times N end fraction equals fraction numerator sum open parentheses x cross times f close parentheses over denominator n cross times N end fraction

      • f is the frequency for each value of x (these are given in a table in the question)
      • n is from straight B open parentheses n comma p close parentheses and N is the sum of the observed values
    • Remember that estimating this parameter uses up one degree of freedom

What are the steps?

  • STEP 1: Write the hypotheses
    • straight H subscript 0 : A binomial distribution is a suitable model for Variable X
    • straight H subscript 1 : A binomial distribution is not a suitable model for Variable X
      • The hypotheses should always be stated in the context of the question
      • Make sure you clearly write what the variable is and don’t just call it 'Variable X'
      • If you are given the assumed value of p then state the precise distribution straight B left parenthesis n comma space p right parenthesis
  • STEP 2: Calculate the expected frequencies
    • If you were not given the assumed value of p then you will first have to estimate it using the observed data
    • Find the probability of the outcome using the binomial distribution straight P left parenthesis X equals x right parenthesis
    • Multiply the probability by the total number of observations straight P left parenthesis X equals x right parenthesis cross times N
    • You will have to combine rows/columns if any expected values are less than 5 until they are greater than 5
  • STEP 3: Calculate the degrees of freedom for the test
    • For k outcomes (after combining expected values if needed)
    • Degrees of freedom is 
      • nu equals k minus 1 if you were given the assumed value of p in the question
      • nu equals k minus 2 if you had to estimate the value of p using data in the question
  • STEP 4: Calculate X squared using either version of the formula

X squared equals stack sum space with i equals 1 below and n on top open parentheses O subscript i minus E subscript i close parentheses squared over E subscript i equals open parentheses stack sum space with i equals 1 below and n on top O subscript i squared over E subscript i close parentheses minus N 

    • Determine the appropriate chi squared critical value
      • chi subscript nu superscript 2 open parentheses alpha percent sign close parentheses is the critical value with nu degrees of freedom for significance level alpha 
      • use the 'Percentage Points of the chi squared Distribution' table in the exam formula booklet
    • Or, alternatively, use a calculator to find the chi subscript nu superscript 2 p-value
      • This is the probability of obtaining a chi-squared value of X squared or more
  • STEP 5: Decide whether there is evidence to reject the null hypothesis
    • Compare the statistic with the critical value you have determined
      • If X squared > critical value (or p less than alpha) then there is sufficient evidence to reject bold H subscript bold 0
      • If X squared < critical value (or p greater than alpha) then there is insufficient evidence to reject bold H subscript bold 0
  • STEP 6: Write your conclusion
    • If you reject H0
      • A binomial distribution is not a suitable model
    •  If you do not reject H0
      • A binomial distribution is a suitable model
    • Be sure to state your conclusion in the context of the question

Worked example

A stage in a video game has three boss battles. 1000 people try this stage of the video game and the number of bosses defeated by each player is recorded.

Number of bosses defeated

0

1

2

3

Frequency

490

384

111

15

It is suggested that the distribution can be modelled by a binomial distribution with p equals 0.2.

Test, at the 5% significance level, whether or not a binomial distribution is a good model.

4aJ47JIQ_binomial-chi-squared-test-we-1binomial-chi-squared-test-we-2

Chi Squared for Poisson

How do I do a chi-squared test with a Poisson distribution?

  • A chi-squared (χ²) goodness of fit test can be used to test data from a sample suggesting that the population has a Poisson distribution
  • For a random variable X to have a Poisson distribution:
    • events must occur independently of each other
    • events must occur singly and randomly
    • events must occur at a constant rate (in space or time)
    • the mean and the variance must be equal
  •  You will either be given a precise Poisson distribution Po open parentheses lambda close parentheses to test 
    • with an assumed value for lambda
  •  Or you will be asked to test whether a Poisson distribution is suitable without being given an assumed value for lambda 
    • In this case you will have to calculate an estimate for the value of lambda for the Poisson distribution
    • The estimate for N observations is just the mean of the observed sample:

lambda equals fraction numerator sum open parentheses x italic cross times f close parentheses over denominator N end fraction 

      • fis the frequency for each value of x (these are given in a table in the question)
      • N is the sum of the observed values
    • Remember that estimating this parameter uses up one degree of freedom

What are the steps?

  • STEP 1: Write the hypotheses
    • straight H subscript 0: A Poisson distribution is a suitable model for Variable X
    • straight H subscript 1: A Poisson distribution is not a suitable model for Variable X
      • The hypotheses should always be stated in the context of the question
      • Make sure you clearly write what the variable is and don’t just call it 'Variable X'
      • If you are given the assumed value of lambda then state the precise distribution Po left parenthesis lambda right parenthesis
  • STEP 2: Calculate the expected frequencies
    • If you were not given the assumed value of lambda then you will first have to estimate it using the observed data
    • Find the probability of the outcome using the Poisson distribution straight P left parenthesis X equals x right parenthesis
    • Multiply the probability by the total number of observations straight P left parenthesis X equals x right parenthesis cross times N
    • Poisson variables start on X equals 0 and go up to infinity
      • If a is the smallest observed value in the table then calculate all of straight P left parenthesis X less or equal than a right parenthesis for that column
      • If b is the largest observed value in the table then calculate all of straight P left parenthesis X greater or equal than b right parenthesis up to infinity
        • 1 minus straight P open parentheses X less or equal than b minus 1 close parentheses
    • You will have to combine rows/columns if any expected values are less than 5 until they are greater than 5
  • STEP 3: Calculate the degrees of freedom for the test
    • For k outcomes (after combining expected values if needed)
    • Degrees of freedom is 
      • nu equals k minus 1 if you were given the assumed value of lambda
      • nu equals k minus 2 if you had to estimate the value of lambda
  • STEP 4: Calculate X squared using either version of the formula

X squared equals stack sum space with i equals 1 below and n on top open parentheses O subscript i minus E subscript i close parentheses squared over E subscript i equals open parentheses stack sum space with i equals 1 below and n on top O subscript i squared over E subscript i close parentheses minus N 

    • Determine the appropriate chi squared critical value
      • chi subscript nu superscript 2 open parentheses alpha percent sign close parentheses is the critical value with nu degrees of freedom for significance level alpha 
      • use the 'Percentage Points of the chi squared Distribution' table in the exam formula booklet
    • Or, alternatively, use a calculator to find the chi subscript nu superscript 2 p-value
      • This is the probability of obtaining a chi-squared value of X squared or more
  • STEP 5: Decide whether there is evidence to reject the null hypothesis
    • Compare the statistic with the critical value you have determined
      • If X squared > critical value (or p less than alpha) then there is sufficient evidence to reject bold H subscript bold 0
      • If X squared < critical value (or p greater than alpha) then there is insufficient evidence to reject bold H subscript bold 0
  • STEP 6: Write your conclusion
    • If you reject H0
      • A Poisson distribution is not a suitable model
    •  If you do not reject H0
      • A Poisson distribution is a suitable model
    • Be sure to state your conclusion in the context of the question

Worked example

A parent claims that the number of messages they receive from their teenage child within an hour can be modelled by a Poisson distribution. The parent collects data from 100 one hour periods and records the observed frequencies of the messages received from the child. The parent calculates the mean number of messages received from the sample and uses this to calculate the expected frequencies if a Poisson model is used.

Number of messages

Observed frequency

Expected frequency

0

9

7.28

1

16

a

2

23

24.99

3

22

21.82

4

16

14.29

5

14

7.49

6 or more

0

b

A goodness of fit test at the 10% significance level is to be used to test the parent’s claim.

a)
Write down null and alternative hypotheses to test the parent’s claim.

poisson-chi-squared-test-we-a

b)
Show that the mean number of messages received per hour for the sample is 2.62.

poisson-chi-squared-test-we-b

c)
Calculate the values of a and b, giving your answers to 2 decimal places.

poisson-chi-squared-test-we-c

d)
Perform the hypothesis test.

poisson-chi-squared-test-we-d

Chi Squared for Geometric

How do I do a chi-squared test with a geometric distribution?

  • A chi-squared (chi squared) goodness of fit test can be used to test data from a sample suggesting that the population has a geometric distribution
  • For a random variable X to have a geometric distribution:
    • the trials must be independent
    • each trial can have only two outcomes (success and failure)
    • trials are repeated until the first success
    • the probability of success (p) must be constant
    • the value of the variable is the number of trials until the first success
  •  A question may give a precise geometric distribution Geo open parentheses p close parentheses to test 
    • with an assumed value for p
  •  Or you may be asked to test whether a geometric distribution is suitable without being given an assumed value for p 
    • In this case you will have to calculate an estimate for the value of p for the geometric distribution
    • For N observations of the variable

p equals fraction numerator total space number space of space successes over denominator total space number space of space trials end fraction equals fraction numerator N over denominator sum open parentheses x cross times f close parentheses end fraction

      • fis the frequency for each value of x (these are given in a table in the question)
      • N is the sum of the observed values
    • Remember that estimating this parameter uses up one degree of freedom

What are the steps?

  • STEP 1: Write the hypotheses
    • straight H subscript 0 : A geometric distribution is a suitable model for Variable X
    • straight H subscript 1 : A geometric distribution is not a suitable model for Variable X
      • The hypotheses should always be stated in the context of the question
      • Make sure you clearly write what the variable is and don’t just call it 'Variable X'
      • If you are given the assumed value of p then state the precise distribution Geo left parenthesis p right parenthesis
  • STEP 2: Calculate the expected frequencies
    • If you were not given the assumed value of p then you will first have to estimate it using the observed data
    • Find the probability of the outcome using the geometric distribution straight P left parenthesis X equals x right parenthesis
    • Multiply the probability by the total number of observations straight P left parenthesis X equals x right parenthesis cross times N
    • Geometric variables start on 1 and go up to infinity
      • If a is the smallest observed value in the table then calculate all of X less or equal than a  for that column
      • If b is the largest observed value in the table then calculate all of X greater or equal than b up to infinity
        • The formulae straight P open parentheses X less or equal than x close parentheses equals 1 minus open parentheses 1 minus p close parentheses to the power of x and straight P open parentheses X greater or equal than x close parentheses equals open parentheses 1 minus p close parentheses to the power of x minus 1 end exponent can help
    • You will have to combine rows/columns if any expected values are less than 5 until they are greater than 5
  • STEP 3: Calculate the degrees of freedom for the test
    • For k outcomes (after combining expected values if needed)
    • Degrees of freedom is 
      • nu equals k minus 1 if you were given the assumed value of p
      • nu equals k minus 2 if you had to estimate the value of p
  • STEP 4: Calculate X squared using either version of the formula

X squared equals stack sum space with i equals 1 below and n on top open parentheses O subscript i minus E subscript i close parentheses squared over E subscript i equals open parentheses stack sum space with i equals 1 below and n on top O subscript i squared over E subscript i close parentheses minus N 

    • Determine the appropriate chi squared critical value
      • chi subscript nu superscript 2 open parentheses alpha percent sign close parentheses is the critical value with nu degrees of freedom for significance level alpha 
      • use the 'Percentage Points of the chi squared Distribution' table in the exam formula booklet
    • Or, alternatively, use a calculator to find the chi subscript nu superscript 2 p-value
      • This is the probability of obtaining a chi-squared value of X squared or more
  • STEP 5: Decide whether there is evidence to reject the null hypothesis
    • Compare the statistic with the critical value you have determined
      • If X squared > critical value (or p less than alpha) then there is sufficient evidence to reject bold H subscript bold 0
      • If X squared < critical value (or p greater than alpha) then there is insufficient evidence to reject bold H subscript bold 0
  • STEP 6: Write your conclusion
    • If you reject H0
      • A geometric distribution is not a suitable model
    •  If you do not reject H0
      • A geometric distribution is a suitable model
    • Be sure to state your conclusion in the context of the question

Worked example

Mercurio is a door-to-door salesman.  Over the course of a week he records the number of doors he needs to knock on each time before getting an answer.

Number of doors

1

2

3

4

5

Total

Frequency

205

61

22

8

4

300

Mercurio thinks he can model the number of doors he needs to knock on each time using a geometric random variable X tilde Geo open parentheses p close parentheses.

a)
Using the observed frequencies, find an estimate for p.

geometric-chi-squared-test-we-a

b)
Conduct a goodness of fit test at the 10% significance level, and say whether a geometric random variable is a good model for the data.

geometric-chi-squared-test-we-b1geometric-chi-squared-test-we-b2

You've read 0 of your 5 free revision notes this week

Sign up now. It’s free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Roger

Author: Roger

Expertise: Maths

Roger's teaching experience stretches all the way back to 1992, and in that time he has taught students at all levels between Year 7 and university undergraduate. Having conducted and published postgraduate research into the mathematical theory behind quantum computing, he is more than confident in dealing with mathematics at any level the exam boards might throw at you.