Compression - Huffman Coding (AQA GCSE Computer Science)

Revision Note

Robert Hampton

Written by: Robert Hampton

Reviewed by: James Woodhouse

Huffman Coding

What is huffman coding?

  • Huffman coding is a method of lossless compression primarily used on text based data (documents)

  • A huffman coding tree is used to compress the data whilst keeping all the data so that it can be uncompressed back to its original state

Example (step 1)

  • A text file contains the following characters

Text file

I

 

L

O

V

E

 

C

O

M

P

U

T

E

R

S

  • The text file contain 16 characters including spaces

  • A frequency analysis shows there is some repetition in the characters, for example there are 2 x E's

Frequency

I

space

L

O

V

E

C

M

P

U

T

R

S

1

2

1

2

1

2

1

1

1

1

1

1

1

Huffman Trees

What is a huffman tree?

  • A huffman tree, also known a binary tree is used to lossless compress text based data as part of huffman coding

  • A huffman tree consists of nodes which can have either 0, 1 or 2 child nodes

  • Binary trees are covered in much more detail at A-Level 

Example (step 2)

  • The frequency analysis data is ordered from least to most frequent

Table with two rows; the first row has the letters "I L V C M P U T R S" and "space O E," and the second row numbers "1 1 1 1 1 1 1 1 1 1 2 2 2".
  • The 2 least frequent characters (I, L) are joined together into a node as part of a binary tree as below 

huffman-coding-image1
  • Update the frequency data to show the combined characters

Table with letters "V C M P U T R S I L space O E" in the first row and corresponding numbers "1 1 1 1 1 1 1 1 2 2 2 2 2" in the second row.
  • Repeat steps until all characters are combined

huffman-coding-image2
  • Next, follow each branch and place a 0 on left branches and a 1 on right branches, this creates a unique code for each character

huffman-coding-image3
  • This allows unique bit patterns for each character to be created and characters that are more frequent have a smaller bit pattern making it more efficient

Character

Bit pattern

Times used

Total bits

space

000

2

3*2=6

O

001

2

3*2=6

E

010

2

3*2=6

V

1000

1

4*1=4

C

1001

1

4*1=4

M

1010

1

4*1=4

P

1011

1

4*1=4

U

1100

1

4*1=4

T

1101

1

4*1=4

R

1110

1

4*1=4

S

1111

1

4*1=4

 

 

Total number of bits

50

  • In this example, using huffman coding we have compressed the original text files to 50 bits

  • Uncompressed, using ASCII, the text file would be 16 characters x 7 bits per character (16*7=112 bits) which means huffman coding has saved 62 bits of storage (112-50=62)

Last updated:

You've read 0 of your 5 free revision notes this week

Sign up now. It’s free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Robert Hampton

Author: Robert Hampton

Expertise: Computer Science Content Creator

Rob has over 16 years' experience teaching Computer Science and ICT at KS3 & GCSE levels. Rob has demonstrated strong leadership as Head of Department since 2012 and previously supported teacher development as a Specialist Leader of Education, empowering departments to excel in Computer Science. Beyond his tech expertise, Robert embraces the virtual world as an avid gamer, conquering digital battlefields when he's not coding.

James Woodhouse

Author: James Woodhouse

Expertise: Computer Science

James graduated from the University of Sunderland with a degree in ICT and Computing education. He has over 14 years of experience both teaching and leading in Computer Science, specialising in teaching GCSE and A-level. James has held various leadership roles, including Head of Computer Science and coordinator positions for Key Stage 3 and Key Stage 4. James has a keen interest in networking security and technologies aimed at preventing security breaches.