Population & Sampling (Edexcel GCSE Statistics)
Revision Note
Written by: Roger B
Reviewed by: Dan Finlay
Population & Sample Types
What are populations, samples and sampling frames?
The population refers to the whole set of things which you are interested in
e.g. if a vet wanted to know how long a typical French bulldog sleeps for in a day
then the population would be all the French bulldogs in the world
Be careful - the word 'population' can mean different things in different contexts
e.g. 'the population of the UK' is usually used to refer to everyone in the UK
But if you're studying UK dentists then the 'population' for your study would be restricted to all the dentists in the UK
A sample refers to a subset of the population which is used to collect data from
e.g. out of all the French bulldogs in the world (the population)
a vet might take a sample of French bulldogs from different cities and record how long they sleep in a day
A sampling frame (or sample frame) is a list of all members of the population
For example, a list of employees’ names within a company
Not every population will have an easily-accessible sampling frame
What's the difference between a census and a sample?
A census collects data about all the members of a population
e.g. the government in the UK does a national census every 10 years to collect data about every person living in England at the time
The main advantage of a census is that it gives fully accurate results
The disadvantages of a census are:
It is time consuming and expensive to carry out
It can destroy or use up all the members of a population (imagine a company testing every single firework it produces)
Sampling is used to collect data from a subset of the population
The advantages of sampling are:
It is quicker and cheaper than a census
It leads to less data needing to be analysed
The disadvantages of sampling are:
It might not represent the population accurately
It could introduce bias, if some parts of the population are more represented in the sample than others
What different sampling techniques do I need to know?
Random sampling methods
Simple random sampling: here every member of the population has an equal probability of being selected for the sample
To select a simple random sample of members of the population
Uniquely number every member of the population
Then randomly select different numbers using a random number generator (or other form of random selection)
Stratified sampling: the population is divided into separate groups (called strata) and then a random sample is taken from each group (stratum)
The proportion of a sample that belongs to a stratum is equal to the proportion of the population as a whole that belongs to that stratum
e.g. if 1/20 of the population belongs to a particular stratum
then 1/20 of the sample should come from that stratum
A population could be split into strata by age ranges, gender, occupation, etc.
See the spec points on 'Random Samples' and 'Stratified Samples' for more info on these two methods
Non-random sampling methods
Note: some of these methods include random elements, but the samples as a whole are not random
Judgement sampling: here you simply use your judgement to choose a sample of the population
You should attempt to make sure that the sample is representative of the population as a whole
Opportunity (convenience) sampling: a sample is formed using available members of the population who fit the study criteria
e.g. for a study of UK consumers you could stand on a street corner and interview the first 50 people who walk by
Cluster sampling: the population is divided into sensible 'clusters' and then a number of clusters are chosen at random to form the sample
e.g. a study of UK education might use schools as the clusters
then select 50 schools at random and use the people in those schools as the sample
Systematic sampling: a sample is formed by choosing members of a population at regular intervals using a list (sampling frame)
e.g. to select 1/10 of the students in a school as a sample
Start with a list of all students
Select one student at random as a 'starting point'
Then also select every 10th student on the list after that starting point
(If necessary, wrap back around to the start of the list when you get to the end)
Quota sampling: the population is split into groups (like in stratified sampling) and a quota is specified for each group
The quota specifies how many members of the population are to be selected from each group
This will often be done in the same way as selecting the sizes of the strata in stratified sampling
Or other criteria could be used to set the quota for each group
Members of the population are selected until each quota is filled
If a member does not want to be included then another member is chosen instead
The members do not have to be selected randomly
What are the advantages and disadvantages of different sampling techniques?
In general
Most sampling techniques can be improved by taking a larger sample
You want to minimise the bias within a sample
This occurs when the sample is not representative of the population
The best way to do avoid bias (when possible) is to use a random method
Sometimes the 'best' method would cost too much or take too much time
So you need to choose the 'best method you can afford (or have the time for)'
A sample only gives information about the members in the sample
A different sample from the same population could lead to different conclusions about the population!
Simple random sampling:
This is the best sampling method for avoiding bias
Although it is possible that members of some groups in the population will not be represented in the sample
To avoid this stratified sampling can be used instead
Most useful when you have a small population or want a small sample
e.g. children in a class
This cannot be used if it is not possible to number or list all the members of the population
e.g. the fish in a lake
Stratified sampling:
This should be used when the population can be split into obvious groups
Useful when there are very different groups of members within a population
The sample will be representative of the population structure
Members of every group (stratum) are guaranteed to be included in the sample
The members selected from each stratum are chosen randomly
This helps to avoid bias
This cannot be used
if the population cannot be split into groups
or if the groups overlap
Systematic sampling:
This is useful when you want a sample from a large population
You need access to a sampling frame (list of the population)
If the order of the sampling frame is random then the sample will also be random
This cannot be used if it is not possible to number or list all the members of the population
e.g. penguins in Antarctica
Be careful of periodic (i.e. regularly recurring) patterns in the sampling frame
e.g. a list of names where the names are grouped by 5-person teams with the team captain appearing first
If you selected every 5th name in the list you would end up with either all captains or no captains in your sample
Quota sampling:
This is useful when a small sample is needed to be representative of the population structure
Useful when collecting data by asking people who walk past you in a public place or when a sampling frame is not available
Just keep asking people until the quota is filled for each group
This can introduce bias as some members of the population might choose not to be included in the sample
Cluster sampling:
This will usually require less time and be less expensive than simple random sampling or stratified sampling
e.g. if your clusters are schools, you will only need to collect data from the people in some of those schools
instead of having to collect data from a few people in every school in the country
However the clusters may not be representative of the population structure as a whole
This can make the sample biased
Opportunity (convenience) sampling:
This should be used when a sample is needed quickly
Useful when a list of the population is not possible
But the sample is unlikely to be representative of the population structure
This can make the sample biased
Judgement sampling:
This can be used when a sample is needed quickly
The person choosing the sample should try to make it representative of the population
But intentionally or unintentionally the sample can end up being biased
Therefore this is rarely a preferred method
Worked Example
Aaron, Belinda and Charlotte are writing an article about school uniforms for their school newsletter. They want to interview a sample of 30 students to find out their opinions about school uniforms.
(a) Write down the population for the survey, and suggest a possible sampling frame.
Be careful with the population here
They only want to interview students, so the population for their survey is only the students in the school
It does not include teachers or other staff members
The population is all the students in the school.
A sampling frame could be an alphabetical list of all the students in the school.
Aaron suggests that he could stand by the school gates in the morning and interview the first 30 students that come past him.
(b) Name this type of sampling and suggest a possible disadvantage.
Opportunity sampling
The sample could be biased. For example, Aaron could end up interviewing all people who have just arrived on the same bus, or groups of friends or siblings arriving at school together.
Belinda suggests that instead Aaron should interview students at the school gates until he has interviewed exactly 6 students from each of the school's year groups (years 7 through 11).
(c) Name this type of sampling and suggest a reason why it would be an improvement over Aaron's original plan.
Quota sampling
The sample would probably be more representative of all the students in the school, because it would be certain to include students from each year group.
In the end, Aaron, Belinda and Charlotte decide to use systematic sampling to select their sample.
(d) Given that there are 480 students in the school, suggest how they might go about choosing their sample.
They are going to need to select names from a list
But first we need to know what proportion of the students in the school they want to interview
Divide the number in their sample (40) by the total number of students (480)
So they want their sample to contain 1/12 of the students in the school
This means they need to choose every 12th name in the list (after the random starting point)
They will need a list of all the students in the school to use as a sampling frame.
They need to randomly select one student from the list as a starting point, then also select every 12th student from the list after that.
They may need to 'wrap back around' to the start of the list to get all 40 names for their sample.
Random Samples
What do I need to know about random sampling?
In a simple random sample every member of the population has an equal probability of being selected for the sample
This means that the sample selection is fair and unbiased
Therefore the sample is likely to be representative of the population
To minimise bias this will usually be the best method
But it can also be expensive and time-consuming
And some groups in the population may end up not being represented in the sample
Some other sampling methods also include random selection
In stratified sampling, the members of the population chosen from each stratum (group) are chosen randomly
A 'simple random sample' is taken from each stratum
This leads to relatively unbiased samples that also reflect the population structure
In cluster sampling, the clusters to include in the sample are selected randomly
This can give good results if the clusters are representative of the population as a whole
In systematic sampling the 'starting point' member in the population list is chosen randomly
This is not considered a 'random sample' unless the ordering of the list is also random
How is a random sample selected?
To take a simple random sample you need to have access to a list of all members of the population (i.e. a sampling frame)
Every member in the sampling frame must be assigned a number
Usually this will mean starting at 1
and numbering the rest of the list in order: 2, 3, 4, etc.
To select a random sample of members of the population
random numbers must be generated
The members in the list with those numbers are then selected for the sample
You should be familiar with the different options for choosing random numbers
Random numbers can be selected from a random number table
The numbers in the table may have more digits than you need
e.g. you want 2-digit numbers but the table shows
469066 155387 172419 953505
In this case you can break the table numbers into smaller numbers
So here read the numbers in the table as
46 90 66 15 53 87 17 24 19 95 35 05'05' is just a two-digit way of writing '5'
You could use a random number generator on a calculator
This may give you random 3-digit decimals between 0 and 1
e.g. 0.541, 0.414, 0.929
These can be multiplied by 1000 to give you integer answers: 541, 414, 929
Or you may be able to ask for random integers between two values
e.g. a random integer between 1 and 6
This would be just like rolling a fair 6-sided dice
Apps on a computer or online can also generate random numbers
These apps usually let you specify what you want
how many numbers
between what values
You can select random numbers by rolling dice
A fair 10-sided dice can give values between 0 and 9
Use two 10-sided dice (or roll one dice twice) to get numbers between 0 and 99
i.e., one dice for the tens (10, 20, 30, ...) and one dice for the units (1, 2, 3, ...)
Or use three 10-sided dice to get numbers between 0 and 999
i.e., one for hundreds, one for tens, and one for units
You could also put all the numbers into a hat (or bag, etc.)
And draw numbers out at random
This is less easy to do with a lot of numbers!
Similarly, numbers might be drawn at random from a deck of cards with the numbers written on them
You should know how to deal with problems that occur when choosing random numbers
You may get a random number that does not match any of the items in your list
e.g. if you have 80 items in a list numbered 1 to 80
but get the random number 93
If this happens, simply ignore any numbers that don't match
and keep generating random numbers until you have enough that do match items in the list
You may get a random number that occurs more than once
In this case keep the first version of the number
and ignore any repeated versions
Keep generating random numbers until you have enough unique numbers
i.e. ones that only occur one time each
Worked Example
Florence has a list of her company's 832 customers. She would like to choose a simple random sample of 12 customers to survey about some new changes she was thinking of making to the company website.
(a) State an advantage of using simple random sampling to chose the 12 people to interview.
Using simple random sampling means every customer has an equal chance of being chosen. This should minimise possible bias in the sample.
Florence finds the following list of random numbers in a table of random numbers.
(b) Explain how Florence can use her customer list along with those random numbers to select the 12 customers to interview. In using the numbers from the table, you should start at the top left and work across from left to right.
First Florence will need to prepare her sampling frame (i.e. her customer list)
Florence should start by numbering the customers in her list from 1 to 832.
To get random numbers up to 832 Florence will need random 3-digit numbers
To do this she can think of each number in the table as being two separate 3-digit numbers:
855 737 648 311 989 903 068 440 922 412 748 392
445 546 862 885 418 648 010 910 148 805 533 291
927 476 920 027 688 416 013 932 766 179 811 230
Starting with the first row, ignore any numbers that are greater than 832:
855 737 648 311 989 903 068 440 922 412 748 392
Use the numbers that are left
Note that '068' is the 3-digit version of 68
From the first row: 737, 648, 311, 68, 440, 412, 748, 392
That's 8 numbers, so she needs 4 more
Continue with the second row:
445 546 862 885 418 648 010 910 148 805 533 291
The first 4 numbers are 445, 546, 418 and 648
But 648 has been chosen already
So ignore that and use the next number instead ('010'=10)
From the second row: 445, 546, 418, 10
From the numbered customer list she should choose the customers with the following numbers for her sample:
737, 648, 311, 68, 440, 412, 748, 392, 445, 546, 418, 10
Charlotte could also, for example, have just used the first 3 digits in each of the numbers in the table. This would have given her the following numbers for her sample:
648, 68, 748, 445, 418, 10, 148, 533, 688, 13, 766, 811
Stratified Samples
How is a stratified sample selected?
To take a stratified sample, the population must first be divided into a number of groups (strata)
Every member of the population must belong to one group
No member of the population can belong to more than one group
i.e. the groups cannot overlap
The strata could be based on age ranges, gender, occupation, etc.
The number of members chosen from each stratum corresponds to the proportion of the population that belongs to that stratum
e.g. if 1/20 of the population belongs to a particular stratum
then 1/20 of the sample will be chosen from that stratum
To find the number to be chosen from each stratum use the formula:
i.e. divide the size of the stratum by the size of the population
then multiply by the size of the sample
the size of the stratum just means how many total members are in the stratum
Once you know how many members to choose from each stratum
those members should be chosen randomly from all the members of the stratum
i.e. take a 'simple random sample' of the correct size from each stratum
How do I choose a stratified sample based on more than one category?
It is possible that the strata to be used will be based on more than one category
For example the 900 people working for a large company
could be divided into managers and employees
but could also be divided according to whether they usually walk or bike to work, drive to work, or use public transport
walk or bike | drive | public transport | |
employees | 180 | 225 | 414 |
managers | 27 | 45 | 9 |
In a case like this each stratum for a sample could be based on two categories
e.g. 'employees who drive to work', 'managers who use public transport', etc.
Once the strata have been decided, find the number in each stratum in the usual way
e.g. if you want a total sample of 100 people (out of the 900 people in the table)
the number of employees who walk or bike in the sample would be
the number of managers who walk or bike in the sample would be
etc.
Examiner Tips and Tricks
After you calculate the numbers to be chosen from each stratum
add them up and make sure they equal the total size of the sample you were looking for
This is a good way to spot possible mistakes in your working
Worked Example
In Dafydd's school there are 636 students, 36 teachers, and 48 non-teaching staff. For a research project he is working on, Dafydd wishes to choose a stratified sample of 60 people from the students and staff at the school.
Calculate the numbers of students, teachers and non-teaching staff that Dafydd should include in his sample.
First we need to find the total number of people in the population
(Here the population is all the students and staff in the school)
To find the number from each group use
Check to make sure those numbers add up to 60:
53+3+4=60
53 students, 3 teachers and 4 non-teaching staff
Last updated:
You've read 0 of your 5 free revision notes this week
Sign up now. It’s free!
Did this page help you?