Compression - Huffman Coding (AQA GCSE Computer Science)
Revision Note
Written by: Robert Hampton
Reviewed by: James Woodhouse
Huffman Coding
What is huffman coding?
Huffman coding is a method of lossless compression primarily used on text based data (documents)
A huffman coding tree is used to compress the data whilst keeping all the data so that it can be uncompressed back to its original state
Example (step 1)
A text file contains the following characters
Text file | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
I |
| L | O | V | E |
| C | O | M | P | U | T | E | R | S |
The text file contain 16 characters including spaces
A frequency analysis shows there is some repetition in the characters, for example there are 2 x E's
Frequency | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
I | space | L | O | V | E | C | M | P | U | T | R | S |
1 | 2 | 1 | 2 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Huffman Trees
What is a huffman tree?
A huffman tree, also known a binary tree is used to lossless compress text based data as part of huffman coding
A huffman tree consists of nodes which can have either 0, 1 or 2 child nodes
Binary trees are covered in much more detail at A-Level
Example (step 2)
The frequency analysis data is ordered from least to most frequent
The 2 least frequent characters (I, L) are joined together into a node as part of a binary tree as below
Update the frequency data to show the combined characters
Repeat steps until all characters are combined
Next, follow each branch and place a 0 on left branches and a 1 on right branches, this creates a unique code for each character
This allows unique bit patterns for each character to be created and characters that are more frequent have a smaller bit pattern making it more efficient
Character | Bit pattern | Times used | Total bits |
---|---|---|---|
space |
| 2 | 3*2=6 |
O |
| 2 | 3*2=6 |
E |
| 2 | 3*2=6 |
V |
| 1 | 4*1=4 |
C |
| 1 | 4*1=4 |
M |
| 1 | 4*1=4 |
P |
| 1 | 4*1=4 |
U |
| 1 | 4*1=4 |
T |
| 1 | 4*1=4 |
R |
| 1 | 4*1=4 |
S |
| 1 | 4*1=4 |
|
| Total number of bits | 50 |
In this example, using huffman coding we have compressed the original text files to 50 bits
Uncompressed, using ASCII, the text file would be 16 characters x 7 bits per character (16*7=112 bits) which means huffman coding has saved 62 bits of storage (112-50=62)
Last updated:
You've read 0 of your 5 free revision notes this week
Sign up now. It’s free!
Did this page help you?