Goodness of Fit (Edexcel A Level Further Maths: Further Statistics 1)

Revision Note

Mark

Author

Mark

Last updated

Goodness of Fit

What is the difference between observed values and expected values?

  • Goodness of fit is a measure of how well real-life observed data fits a theoretical model
    • For example, modelling a coin as fair then flipping it 20 times  
      • You may observe 13 heads
      • You would expect 10 heads
  • Observed (O subscript i) and expected (E subscript i) values can be shown in a table
    • For example, rolling a fair die 60 times (N equals 60)
      • Outcome 1 2 3 4 5 6
        O subscript i 12 7 8 10 14 9
        E subscript i 10 10 10 10 10 10
      • Note that sum from blank to blank of O subscript i equals sum from blank to blank of E subscript i equals N equals 60
  • How different do observed and expected values need to be before the model is not a good fit
    • You can do a hypothesis test to reach a conclusion

What are the null and alternative hypotheses?

  • straight H subscript 0 colon There is no difference between the observed and the expected distribution
  • straight H subscript 1 colon The observed distribution cannot be modelled by the expected distribution
  • Let alpha percent sign be the significance level

How do I calculate the goodness of fit?

  • First, combine any columns for which expected values are less than 5 until they are greater than 5
    • For example
      • Score 1 2 3 4
        O subscript i 15 6 4 1
        E subscript i 12 8 4 2
      • The expected value of 2 is less than 5 so combine the last two columns
      • Score 1 2 3+
        O subscript i 15 6 5
        E subscript i 12 8 6
  • Then calculate the goodness of fit, X squared, from the formula
    • X squared equals sum from blank to blank of open parentheses O subscript i minus E subscript i close parentheses squared over E subscript i
  • An alternative version of the formula that can be easier to calculate is
    • X squared equals sum from blank to blank of O subscript i squared over E subscript i minus N
      • Where N is the sum of all observed values
      • N equals sum from blank to blank of O subscript i
      • This is also the same as the sum of all expected values
  • The larger X squared is, the more different the observed values are from the expected values

What are degrees of freedom?

  • The number of degrees of freedom, nu, is equal to
    • The number of columns (after combining to get E subscript i greater than 5) subtract 1
  • If you also use the observed data to estimate a parameter, then you subtract 2 instead
    • For example, trying to estimate p when comparing to a straight B open parentheses n comma p close parentheses distribution
  • You are subtracting the number of constraints (or restrictions)
    • This is the number of times you use the observed data to help form the expected data
      • This is always 1 from ensuring their totals match, sum from blank to blank of E subscript i equals sum from blank to blank of O subscript i
      • Then another 1 for each parameter estimated

How do I use the chi-squared distribution?

  • Once you have calculated the goodness of fitX squared
    • Compare it to the critical value chi subscript nu superscript 2 from the chi-squared distribution
      • nu is the number of degrees of freedom
      • Tables of critical values are provided in the exam
      • You need the significance levelalpha percent sign
      • All chi-squared tests are one-tailed
    • If X squared less than chi subscript nu superscript 2 space open parentheses alpha percent sign close parentheses then there is insufficient evidence to reject straight H subscript 0
      • This means there is no difference between the observed and expected distributions
      • In other words, "the expected distribution is a suitable model for the data"
    • If X squared greater than chi subscript nu superscript 2 space open parentheses alpha percent sign close parentheses then there is sufficient evidence to reject straight H subscript 0
      • The expected distribution is not a suitable model for the data
  • Alternatively, you can use your calculator to find the chi subscript nu superscript 2 p-value
    • This is the probability of obtaining a chi-squared value of X squared or more
    • If p less than alpha then the result is critical (reject straight H subscript 0)

Examiner Tip

  • The alternative formula X squared equals sum from blank to blank of O subscript i squared over E subscript i minus N is not given in the Formulae Booklet

Worked example

A game is meant to award points according to the probability distribution below.

Points 2 4 8 10
Probability 0.6 0.2 0.15 0.05

The game is played by 40 people, giving the results below.

Points 2 4 8 10
Frequency 28 5 4 3

Test, at the 5% level of significance, whether or not the game is operating correctly.

goodness-of-fit-1goodness-of-fit-2

You've read 0 of your 5 free revision notes this week

Sign up now. It’s free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Mark

Author: Mark

Expertise: Maths

Mark graduated twice from the University of Oxford: once in 2009 with a First in Mathematics, then again in 2013 with a PhD (DPhil) in Mathematics. He has had nine successful years as a secondary school teacher, specialising in A-Level Further Maths and running extension classes for Oxbridge Maths applicants. Alongside his teaching, he has written five internal textbooks, introduced new spiralling school curriculums and trained other Maths teachers through outreach programmes.