Goodness of Fit (Edexcel A Level Further Maths) : Revision Note

Last updated

7 January 2025

Goodness of Fit

What is the difference between observed values and expected values?

Goodness of fit is a measure of how well real-life observed data fits a theoretical model
- For example, modelling a coin as fair then flipping it 20 times
  - You may observe 13 heads
  - You would expect 10 heads
Observed ( $O_{i}$ ) and expected ( $E_{i}$ ) values can be shown in a table
- For example, rolling a fair die 60 times ( $N = 60$ )
  Outcome
  1
  2
  3
  4
  5
  6
  $O_{i}$
  12
  7
  8
  10
  14
  9
  $E_{i}$
  10
  10
  10
  10
  10
  10
  - Note that $\sum_{}^{} O_{i} = \sum_{}^{} E_{i} = N = 60$
How different do observed and expected values need to be before the model is not a good fit?
- You can do a hypothesis test to reach a conclusion

What are the null and alternative hypotheses?

$H_{0} :$ There is no difference between the observed and the expected distribution
$H_{1} :$ The observed distribution cannot be modelled by the expected distribution
Let $α %$ be the significance level

How do I calculate the goodness of fit?

First, combine any columns for which expected values are less than 5 until they are greater than 5
- For example
  Score
  1
  2
  3
  4
  $O_{i}$
  15
  6
  4
  1
  $E_{i}$
  12
  8
  4
  2
  - The expected value of 2 is less than 5 so combine the last two columns
  Score
  1
  2
  3+
  $O_{i}$
  15
  6
  5
  $E_{i}$
  12
  8
  6
Then calculate the goodness of fit, $X^{2}$ , from the formula
- $X^{2} = \sum_{}^{} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}}$
An alternative version of the formula that can be easier to calculate is
- $X^{2} = \sum_{}^{} \frac{{O_{i}}^{2}}{E_{i}} - N$
  - Where $N$ is the sum of all observed values
  - $N = \sum_{}^{} O_{i}$
  - This is also the same as the sum of all expected values
The larger $X^{2}$ is, the more different the observed values are from the expected values

What are degrees of freedom?

The number of degrees of freedom, $ν$ , is equal to
- The number of columns (after combining to get $E_{i} > 5$ ) subtract 1
If you also use the observed data to estimate a parameter, then you subtract 2 instead
- For example, trying to estimate $p$ when comparing to a $B (n, p)$ distribution
You are subtracting the number of constraints (or restrictions)
- This is the number of times you use the observed data to help form the expected data
  - This is always 1 from ensuring their totals match, $\sum_{}^{} E_{i} = \sum_{}^{} O_{i}$
  - Then another 1 for each parameter estimated

How do I use the chi-squared distribution?

Once you have calculated the goodness of fit, $X^{2}$
- Compare it to the critical value $χ_{ν}^{2}$ from the chi-squared distribution
  - $ν$ is the number of degrees of freedom
  - Tables of critical values are provided in the exam
  - You need the significance level, $α %$
  - All chi-squared tests are one-tailed
- If $X^{2} < χ_{ν}^{2} (α %)$ then there is insufficient evidence to reject $H_{0}$
  - This means there is no difference between the observed and expected distributions
  - In other words, "the expected distribution is a suitable model for the data"
- If $X^{2} > χ_{ν}^{2} (α %)$ then there is sufficient evidence to reject $H_{0}$
  - The expected distribution is not a suitable model for the data
Alternatively, you can use your calculator to find the $χ_{ν}^{2}$ p-value
- This is the probability of obtaining a chi-squared value of $X^{2}$ or more
- If $p < α$ then the result is critical (reject $H_{0}$ )

Examiner Tips and Tricks

The alternative formula $X^{2} = \sum_{}^{} \frac{{O_{i}}^{2}}{E_{i}} - N$ is not given in the Formulae Booklet

Worked Example

A game is meant to award points according to the probability distribution below.

Points	2	4	8	10
Probability	0.6	0.2	0.15	0.05

The game is played by 40 people, giving the results below.

Points	2	4	8	10
Frequency	28	5	4	3

Test, at the 5% level of significance, whether or not the game is operating correctly.

You've read 0 of your 5 free revision notes this week

Sign up now. It’s free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Previous:Geometric Hypothesis TestingNext:Chi Squared Tests for Standard Distributions

Outcome	1	2	3	4	5	6
$O_{i}$	12	7	8	10	14	9
$E_{i}$	10	10	10	10	10	10

Score	1	2	3	4
$O_{i}$	15	6	4	1
$E_{i}$	12	8	4	2

Score	1	2	3+
$O_{i}$	15	6	5
$E_{i}$	12	8	6

Goodness of Fit (Edexcel A Level Further Maths) : Revision Note

Goodness of Fit

What is the difference between observed values and expected values?

What are the null and alternative hypotheses?

How do I calculate the goodness of fit?

What are degrees of freedom?

How do I use the chi-squared distribution?

You've read 0 of your 5 free revision notes this week

Sign up now. It’s free!

Join the 100,000+ Students that ❤️ Save My Exams

Discrete Probability Distributions

Discrete Probability Distributions

E(X) & Var(X) of Discrete Random Variables

Linear Transformations of DRVs

Problem Solving with DRVs

Poisson & Binomial Distributions

The Poisson Distribution

Poisson Approximations of Binomials

Geometric & Negative Binomial Distributions

The Geometric Distribution

The Negative Binomial Distribution

Probability Generating Functions

Probability Generating Functions

Probability Generating Functions (PGFs)

E(X) & Var(X) of PGFs

PGFs of Standard Distributions

PGFs of Sums & Transformations

Central Limit Theorem

Central Limit Theorem

Central Limit Theorem

Hypothesis Testing

Poisson & Geometric Hypothesis Testing

Poisson Hypothesis Testing

Geometric Hypothesis Testing

Chi Squared Tests

Goodness of Fit

Chi Squared Tests for Standard Distributions

Chi Squared Tests for Contingency Tables

Quality of Tests

Type I & Type II Errors

Size & Power of Test