Goodness of Fit (Edexcel A Level Further Maths): Revision Note

Last updated

Goodness of Fit

What is the difference between observed values and expected values?

  • Goodness of fit is a measure of how well real-life observed data fits a theoretical model

    • For example, modelling a coin as fair then flipping it 20 times  

      • You may observe 13 heads

      • You would expect 10 heads

  • Observed (O subscript i) and expected (E subscript i) values can be shown in a table

    • For example, rolling a fair die 60 times (N equals 60)

      Outcome

      1

      2

      3

      4

      5

      6

      O subscript i

      12

      7

      8

      10

      14

      9

      E subscript i

      10

      10

      10

      10

      10

      10

      • Note that sum from blank to blank of O subscript i equals sum from blank to blank of E subscript i equals N equals 60

  • How different do observed and expected values need to be before the model is not a good fit

    • You can do a hypothesis test to reach a conclusion

What are the null and alternative hypotheses?

  • straight H subscript 0 colon There is no difference between the observed and the expected distribution

  • straight H subscript 1 colon The observed distribution cannot be modelled by the expected distribution

  • Let alpha percent sign be the significance level

How do I calculate the goodness of fit?

  • First, combine any columns for which expected values are less than 5 until they are greater than 5

    • For example

      Score

      1

      2

      3

      4

      O subscript i

      15

      6

      4

      1

      E subscript i

      12

      8

      4

      2

      • The expected value of 2 is less than 5 so combine the last two columns

      Score

      1

      2

      3+

      O subscript i

      15

      6

      5

      E subscript i

      12

      8

      6

  • Then calculate the goodness of fit, X squared, from the formula

    • X squared equals sum from blank to blank of open parentheses O subscript i minus E subscript i close parentheses squared over E subscript i

  • An alternative version of the formula that can be easier to calculate is

    • X squared equals sum from blank to blank of O subscript i squared over E subscript i minus N

      • Where N is the sum of all observed values

      • N equals sum from blank to blank of O subscript i

      • This is also the same as the sum of all expected values

  • The larger X squared is, the more different the observed values are from the expected values

What are degrees of freedom?

  • The number of degrees of freedom, nu, is equal to

    • The number of columns (after combining to get E subscript i greater than 5) subtract 1

  • If you also use the observed data to estimate a parameter, then you subtract 2 instead

    • For example, trying to estimate p when comparing to a straight B open parentheses n comma p close parentheses distribution

  • You are subtracting the number of constraints (or restrictions)

    • This is the number of times you use the observed data to help form the expected data

      • This is always 1 from ensuring their totals match, sum from blank to blank of E subscript i equals sum from blank to blank of O subscript i

      • Then another 1 for each parameter estimated

How do I use the chi-squared distribution?

  • Once you have calculated the goodness of fitX squared

    • Compare it to the critical value chi subscript nu superscript 2 from the chi-squared distribution

      • nu is the number of degrees of freedom

      • Tables of critical values are provided in the exam

      • You need the significance levelalpha percent sign

      • All chi-squared tests are one-tailed

    • If X squared less than chi subscript nu superscript 2 space open parentheses alpha percent sign close parentheses then there is insufficient evidence to reject straight H subscript 0

      • This means there is no difference between the observed and expected distributions

      • In other words, "the expected distribution is a suitable model for the data"

    • If X squared greater than chi subscript nu superscript 2 space open parentheses alpha percent sign close parentheses then there is sufficient evidence to reject straight H subscript 0

      • The expected distribution is not a suitable model for the data

  • Alternatively, you can use your calculator to find the chi subscript nu superscript 2 p-value

    • This is the probability of obtaining a chi-squared value of X squared or more

    • If p less than alpha then the result is critical (reject straight H subscript 0)

Examiner Tips and Tricks

  • The alternative formula X squared equals sum from blank to blank of O subscript i squared over E subscript i minus N is not given in the Formulae Booklet

Worked Example

A game is meant to award points according to the probability distribution below.

Points

2

4

8

10

Probability

0.6

0.2

0.15

0.05

The game is played by 40 people, giving the results below.

Points

2

4

8

10

Frequency

28

5

4

3

Test, at the 5% level of significance, whether or not the game is operating correctly.

goodness-of-fit-1
goodness-of-fit-2

You've read 0 of your 5 free revision notes this week

Sign up now. It’s free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?