Estimation from Statistical Data (Edexcel GCSE Statistics)
Revision Note
Written by: Roger B
Reviewed by: Dan Finlay
Estimating Population Characteristics
How can I use samples to estimate population characteristics?
Summary statistics calculated from a sample can be used to estimate the same statistics for the population as a whole
e.g. the mean, median, range, quartiles and interquartile range
Remember that a sample is a selection of members drawn from the population, whereas the population is all the members
Statistics calculated from a sample will usually not be exactly the same as the statistics for the whole population (different mean, etc.)
Statistics calculated from two different samples will also not usually be the same
As long as the sample is representative of the population
then you can assume that the statistics for the population are approximately the same as those for the sample
e.g. assume the population mean is about equal to the sample mean
This can be used to make predictions about the population
About half (50%) of the population will be above the sample median
and about half will be below
About a quarter (25%) of the population will be below the sample lower quartile
and about a quarter will be above the sample upper quartile
About half (50%) of the population will be between the sample upper and lower quartiles
Sample size has an impact on the reliability of estimates made about the population
In general a larger sample size will lead to more reliable conclusions
Examiner Tips and Tricks
If a question asks you about improving the reliability of estimates made from samples
the answer will almost always have to do with increasing the sample size
Worked Example
Paul has been studying a population of rabbits in Lopital Woods. He captured a sample of 50 rabbits and weighed each of the rabbits before releasing them again. He records the following data for his sample:
total weight of rabbits: 82.5 kg
lower quartile: 1.2 kg
median: 1.6 kg
upper quartile: 2.1 kg
(a) Calculate an estimate for the mean weight of the population of rabbits in Lopital Woods.
Assume that the population mean is the same as the sample mean
Calculate the sample mean by dividing the total weight by the number in the sample (50)
1.65 kg
It is assumed that there are a total of 600 rabbits living in Lopital Woods.
(b) Use Paul's data to estimate how many rabbits in Lopital Woods weigh between 1.2 kg and 2.1 kg.
Those values are the lower and upper quartiles of the data set
Half of the data values fall between the lower and upper quartiles
Assume the same is true for the population as a whole
Approximately 300 rabbits
(c) Suggest a way that the reliability of Paul's results could be improved.
Use a larger sample of rabbits to calculate the statistics from
Petersen Capture Recapture Formula
What is the capture recapture method?
The capture recapture method is a way to estimate the size of a population
It is used when it is either impossible or impractical to count the whole population
e.g. too expensive or too time-consuming
Common examples include
the population of fish in a river/lake/sea
the population of wild animals in a natural habitat
The capture recapture method is based on proportion
A first sample of the population is captured
and each member is given an identifiable marker/tag
All members of the sample are then replaced
i.e. released back into the population
At a later time, a second sample of the population is taken
The proportion of the second sample that is tagged is assumed to be the same as the proportion of the population that was tagged in the first sample
How do I use the Petersen capture recapture formula?
The Petersen capture recapture formula is used to find an estimate for the population size using a capture recapture experiment
This formula is not on the exam formula sheet, so you need to remember it
A shorter version is
N is the size of the population
M is the size of the first sample
n is the size of the second sample
m is the number in the second sample that have markers/tags
It can be easier to remember the formula if you understand where it comes from
The proportion of the total population captured (and marked/tagged) in the first sample is
The proportion of the second sample that is marked/tagged is
We assume that those two proportions are equal
Rearrange that equation to get the formula
What assumptions need to be made for the capture recapture method to be valid?
Each member (element) of the population must have an equal chance of being selected
This applies to both the first and second samples
This means that random sampling is used
Between the first and second samples
The tagged members must have had sufficient time and opportunity to mix with the rest of the population
The population remains the same size (broadly speaking)
No (significant number of) births/deaths
No (significant number of) members leave the population
e.g. migration
No marks/tags have been removed or destroyed
e.g. taken off, worn off
The process of tagging the members of the first sample must not affect the likelihood of them being recaptured
When is the capture recapture method reliable?
The sample sizes used must be large enough to be representative of the population for the method to be reliable
Worked Example
Roger captures 50 rabbits from Lopital Woods and marks them by putting a small safe tag on their ears. Roger then releases the rabbits back into the woods.
Some time later, Roger captures 100 rabbits and finds that 8 of the rabbits have the tag on their ears.
(a) Use this information to estimate the size of the population of rabbits in Lopital Woods.
If you remember the formula you can use it directly here
M=50, n=100, m=8
If you don't remember the formula, use the 'equal proportions' idea
The proportion of the population number captured (and tagged) in the first sample is
The proportion of the second sample with tags is
Set those two proportions equal to each other
Solve that equation for N
Answer the question in context
There are approximately 625 rabbits in Lopital Woods
(b) State one assumption that would have to be made for the estimate to be valid.
Roger allowed enough time for the tagged rabbits to mix with the rest of the population between samples
Last updated:
You've read 0 of your 5 free revision notes this week
Sign up now. It’s free!
Did this page help you?