Run Length Encoding & Dictionary Coding (OCR A Level Computer Science)
Revision Note
Run Length Encoding & Dictionary Coding
What is Run-Length Encoding?
Run-Length Encoding (RLE) is a form of data compression that condenses identical elements into a single value with a count.
For a text file, "AAAABBBCCDAA" is compressed to "4A3B2C1D2A"
The string has four 'A's, followed by three 'B's, two 'C's, one 'D', and two 'A's. RLE shows this as "4A3B2C1D2A"
It is used in bitmap images to compress sequences of the same colour
For example, a line in an image with 5 red pixels followed by 3 blue pixels could be represented as "5R3B"
What is Dictionary Coding?
Dictionary coding replaces recurring sequences with shorter, unique codes
A 'dictionary' is compiled to map original sequences to special codes
This method is effective for both text and binary data
The phrase "for example" could be coded as 'FE' if 'FE' doesn't appear in the original text
A sequence of binary numbers '1010' could be replaced by a shorter unique code
Example of Dictionary Coding
Consider this sentence where some algorithm names are repeatedly mentioned:
QuickSort is faster than BubbleSort but MergeSort is more stable than QuickSort and BubbleSort.
Here, the names
QuickSort
,BubbleSort
, andMergeSort
are repeated.Create a dictionary:
Start with an empty dictionary.
Scan the sentence for recurring sequences.
The dictionary might look like this:
{
QuickSort: Q
BubbleSort: B
MergeSort: M
}
Replace the sequences
The repetitive words in the sentence can be replaced with the dictionary values:
Original:
QuickSort is faster than BubbleSort but MergeSort is more stable than QuickSort and BubbleSort.
Compressed:
Q is faster than B but M is more stable than Q and B.
The original string was 95 characters long, and the dictionary-coded example is 53 characters long
The shorter string will require less space in memory or storage
Examiner Tips and Tricks
RLE and Dictionary Coding serve different needs and have their advantages and disadvantages
RLE: More effective when data has lots of repetition
Dictionary Coding: More versatile but may require more computational resources
Worked Example
A survey focuses on the kinds of vehicles travelling on a motorway. For each vehicle that passes, a letter is noted:
For a car, 'C' is entered.
For a motorbike, 'M' is entered.
For a lorry, 'L' is entered.
For any other vehicle, 'O' is entered.
It's decided to compress the data generated.
Run Length Encoding has compressed the following sequence:
3C3M4C
Show the result of decompressing the sequence.
[2]
How to answer this question:
Run Length Encoding (RLE) shows the number of consecutive occurrences of a character in a sequence. To decompress, repeat each character the number of times indicated by the preceding number.
Answer:
Example answer that gets full marks:CCCMMMCCCC
Worked Example
The following sequence represents the raw data collected during the survey:CCCCOLLLCCCCCMOCCCCC
Show the result of compressing the sequence.
[2]
How to answer this question:
To compress using RLE, count consecutive occurrences of each character and append the count before the character itself.
Answer:
Example answer that gets full marks:4C1O3L5C1M1O5C
Worked Example
Write pseudocode for the function "longest", which takes in a string of characters as an argument and returns an integer representing the longest continuous sequence of 'C's.
[6]
How to answer this question
Write pseudocode for a function that iterates through the string, counting continuous sequences of 'C's and keeping track of the longest such sequence.
For example, in ABCCCBAACC
the longest consecutive string of 'C's is 3.
Answer:
Example answer that gets full marks
How this answer might look in pseudocode:
Function longest(s: String) -> Integer:
max_count = 0
current_count = 0
For each character in s:
If char equals 'C':
Increment current_count by 1
If current_count > max_count:
Set max_count to current_count
Else:
Set current_count to 0
Return max_count
How this answer might look in Python:
def longest(s):
max_count = 0
current_count = 0
for char in s:
if char == 'C':
current_count += 1
else:
current_count = 0
if current_count > max_count:
max_count = current_count
return max_count
How this answer might look in Java:
public class Main {
public static void main(String[] args) {
System.out.println(longest("ABCACCCCCABAC")); // Output should be 5
}
public static int longest(String s) {
int maxCount = 0;
int currentCount = 0;
for (int i = 0; i < s.length(); i++) {
char currentChar = s.charAt(i);
if (currentChar == 'C') {
currentCount++;
if (currentCount > maxCount) {
maxCount = currentCount;
}
} else {
currentCount = 0;
}
}
return maxCount;
}
}
Acceptable answers you could have given instead:
You could use different variable names or loop structures, but the logic should be the same to get full marks.
You've read 0 of your 5 free revision notes this week
Sign up now. It’s free!
Did this page help you?