Revision Notes Exam Questions Past Papers

GCSEComputer ScienceAQARevision Notes3. Fundamentals of Data RepresentationUnits of InformationCompression - Huffman Coding

Compression - Huffman Coding (AQA GCSE Computer Science): Revision Note

Exam code: 8525

Written by: Robert Hampton

Reviewed by: James Woodhouse

Updated on 13 August 2024

Huffman Coding

What is huffman coding?

Huffman coding is a method of lossless compression primarily used on text based data (documents)
A huffman coding tree is used to compress the data whilst keeping all the data so that it can be uncompressed back to its original state

Example (step 1)

A text file contains the following characters

Text file
I		L	O	V	E		C	O	M	P	U	T	E	R	S

The text file contain 16 characters including spaces
A frequency analysis shows there is some repetition in the characters, for example there are 2 x E's

Frequency
I	space	L	O	V	E	C	M	P	U	T	R	S
1	2	1	2	1	2	1	1	1	1	1	1	1

Huffman Trees

What is a huffman tree?

A huffman tree, also known a binary tree is used to lossless compress text based data as part of huffman coding
A huffman tree consists of nodes which can have either 0, 1 or 2 child nodes
Binary trees are covered in much more detail at A-Level

Example (step 2)

The frequency analysis data is ordered from least to most frequent

Table with two rows; the first row has the letters "I L V C M P U T R S" and "space O E," and the second row numbers "1 1 1 1 1 1 1 1 1 1 2 2 2".

The 2 least frequent characters (I, L) are joined together into a node as part of a binary tree as below

huffman-coding-image1

Update the frequency data to show the combined characters

Table with letters "V C M P U T R S I L space O E" in the first row and corresponding numbers "1 1 1 1 1 1 1 1 2 2 2 2 2" in the second row.

Repeat steps until all characters are combined

huffman-coding-image2

Next, follow each branch and place a 0 on left branches and a 1 on right branches, this creates a unique code for each character

huffman-coding-image3

This allows unique bit patterns for each character to be created and characters that are more frequent have a smaller bit pattern making it more efficient

Character	Bit pattern	Times used	Total bits
space	`000`	2	3*2=6
O	`001`	2	3*2=6
E	`010`	2	3*2=6
V	`1000`	1	4*1=4
C	`1001`	1	4*1=4
M	`1010`	1	4*1=4
P	`1011`	1	4*1=4
U	`1100`	1	4*1=4
T	`1101`	1	4*1=4
R	`1110`	1	4*1=4
S	`1111`	1	4*1=4
		Total number of bits	50

In this example, using huffman coding we have compressed the original text files to 50 bits
Uncompressed, using ASCII, the text file would be 16 characters x 7 bits per character (16*7=112 bits) which means huffman coding has saved 62 bits of storage (112-50=62)

For teachers

Ready to test your students on this topic?

Create exam-aligned tests in minutes
Differentiate easily with tiered difficulty
Trusted for all assessment types

Explore Test Builder

Test Builder in a diagram showing questions being picked from different difficulties and topics, and being downloaded as a shareable format.

Did this page help you?

Previous:CompressionNext:Compression - Run Length Encoding

Author: Robert Hampton

Expertise: Computer Science Content Creator

Rob has over 16 years' experience teaching Computer Science and ICT at KS3 & GCSE levels. Rob has demonstrated strong leadership as Head of Department since 2012 and previously supported teacher development as a Specialist Leader of Education, empowering departments to excel in Computer Science. Beyond his tech expertise, Robert embraces the virtual world as an avid gamer, conquering digital battlefields when he's not coding.

Reviewer: James Woodhouse

Expertise: Computer Science & English Subject Lead

James graduated from the University of Sunderland with a degree in ICT and Computing education. He has over 14 years of experience both teaching and leading in Computer Science, specialising in teaching GCSE and A-level. James has held various leadership roles, including Head of Computer Science and coordinator positions for Key Stage 3 and Key Stage 4. James has a keen interest in networking security and technologies aimed at preventing security breaches.