Run Length Encoding & Dictionary Coding (OCR A Level Computer Science) : Revision Note

Written by: Callum Davies

Reviewed by: James Woodhouse

Updated on 5 February 2025

Run Length Encoding & Dictionary Coding

What is run length encoding?

In A Level Computer Science, run length encoding (RLE) is a form of data compression that condenses identical elements into a single value with a count.
For a text file, "AAAABBBCCDAA" is compressed to "4A3B2C1D2A"
The string has four 'A's, followed by three 'B's, two 'C's, one 'D', and two 'A's. RLE shows this as "4A3B2C1D2A"
It is used in bitmap images to compress sequences of the same colour
For example, a line in an image with 5 red pixels followed by 3 blue pixels could be represented as "5R3B"

What is dictionary coding?

Dictionary coding replaces recurring sequences with shorter, unique codes
A 'dictionary' is compiled to map original sequences to special codes
This method is effective for both text and binary data
The phrase "for example" could be coded as 'FE' if 'FE' doesn't appear in the original text
A sequence of binary numbers '1010' could be replaced by a shorter unique code

Example of dictionary coding

Consider this sentence where some algorithm names are repeatedly mentioned:
- QuickSort is faster than BubbleSort but MergeSort is more stable than QuickSort and BubbleSort.

Here, the names QuickSort, BubbleSort, and MergeSort are repeated.
Create a dictionary:
- Start with an empty dictionary.
- Scan the sentence for recurring sequences.
- The dictionary might look like this:

{
QuickSort: Q
BubbleSort: B
MergeSort: M
}

Replace the sequences
- The repetitive words in the sentence can be replaced with the dictionary values:
  - Original: QuickSort is faster than BubbleSort but MergeSort is more stable than QuickSort and BubbleSort.
  - Compressed: Q is faster than B but M is more stable than Q and B.

The original string was 95 characters long, and the dictionary-coded example is 53 characters long
- The shorter string will require less space in memory or storage

Examiner Tips and Tricks

RLE and Dictionary Coding serve different needs and have their advantages and disadvantages
RLE: More effective when data has lots of repetition
Dictionary Coding: More versatile but may require more computational resources

Worked Example

A survey focuses on the kinds of vehicles travelling on a motorway. For each vehicle that passes, a letter is noted:

For a car, 'C' is entered.
For a motorbike, 'M' is entered.
For a lorry, 'L' is entered.
For any other vehicle, 'O' is entered.

It's decided to compress the data generated.

Run Length Encoding has compressed the following sequence:

3C3M4C

Show the result of decompressing the sequence.

[2]

How to answer this question:
Run Length Encoding (RLE) shows the number of consecutive occurrences of a character in a sequence. To decompress, repeat each character the number of times indicated by the preceding number.

Answer:

Example answer that gets full marks:
CCCMMMCCCC

Worked Example

The following sequence represents the raw data collected during the survey:
CCCCOLLLCCCCCMOCCCCC

Show the result of compressing the sequence.

[2]

How to answer this question:
To compress using RLE, count consecutive occurrences of each character and append the count before the character itself.

Answer:

Example answer that gets full marks:
4C1O3L5C1M1O5C

Worked Example

Write pseudocode for the function "longest", which takes in a string of characters as an argument and returns an integer representing the longest continuous sequence of 'C's.

[6]

How to answer this question
Write pseudocode for a function that iterates through the string, counting continuous sequences of 'C's and keeping track of the longest such sequence.

For example, in ABCCCBAACC the longest consecutive string of 'C's is 3.

Answer:

Example answer that gets full marks

How this answer might look in pseudocode:

Function longest(s: String) -> Integer:
   max_count = 0
   current_count = 0
    
    For each character in s:
        If char equals 'C':
            Increment current_count by 1
            If current_count > max_count:
                Set max_count to current_count
        Else:
            Set current_count to 0
    
    Return max_count

How this answer might look in Python:

def longest(s): 
      max_count = 0
      current_count = 0
      
      for char in s:
            if char == 'C':
                  current_count += 1
            else:
                  current_count = 0
                  if current_count > max_count:
                        max_count = current_count
      
      return max_count

How this answer might look in Java:

public class Main {
    public static void main(String[] args) {
        System.out.println(longest("ABCACCCCCABAC"));  // Output should be 5
    }

    public static int longest(String s) {
        int maxCount = 0;
        int currentCount = 0;

        for (int i = 0; i < s.length(); i++) {
            char currentChar = s.charAt(i);

            if (currentChar == 'C') {
                currentCount++;
                if (currentCount > maxCount) {
                    maxCount = currentCount;
                }
            } else {
                currentCount = 0;
            }
        }

        return maxCount;
    }
}

Acceptable answers you could have given instead:
You could use different variable names or loop structures, but the logic should be the same to get full marks.

You've read 0 of your 5 free revision notes this week

Sign up now. It’s free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Test yourself

Did this page help you?

Previous:Lossy & Lossless CompressionNext:Symmetric vs Asymmetric Encryption

Run Length Encoding & Dictionary Coding (OCR A Level Computer Science) : Revision Note

Run Length Encoding & Dictionary Coding

What is run length encoding?

What is dictionary coding?

Example of dictionary coding

You've read 0 of your 5 free revision notes this week

Sign up now. It’s free!

Join the 100,000+ Students that ❤️ Save My Exams

1. The Characteristics of Contemporary Processors, Input, Output & Storage Devices

1.1 Structure & Function of the Processor

Components of the CPU

Fetch-Decode-Execute Cycle

CPU Performance

Pipelining

Von Neumann & Harvard Architecture

1.2 Types of Processor

RISC vs CISC

Graphics Processing Unit (GPU)

Multicore & Parallel Processors

1.3 Input, Output & Storage

Input, Output & Storage

RAM & ROM

2. Software & Software Development

2.1 Systems Software

Operating Systems

Memory Management

System Interrupts

Scheduling

Types of Operating System

BIOS

Device Drivers

Virtual Machines

2.2 Applications Generation

Application Software

Utility Software

Open Source & Closed Source Software

Translators

Stages of Compilation

Libraries, Linkers & Loaders

2.3 Software Development

Waterfall Lifecycle

Agile Programming

Spiral Model

Rapid Application Development (RAD)

Comparing Software Development Models

2.4 Types of Programming Language

Programming Paradigms

Procedural Programming

Assembly Language & Little Man Computer

Modes of Addressing

2.5 Object Oriented Languages

Classes (OOP)

Objects (OOP)

Methods (OOP)

Attributes (OOP)

Inheritance (OOP)

Encapsulation (OOP)

Polymorphism (OOP)

3. Exchanging Data

3.1 Compression, Encryption & Hashing

Lossy & Lossless Compression

Run Length Encoding & Dictionary Coding

Symmetric vs Asymmetric Encryption

Hashing

3.2 Databases

Relational Databases

Data Normalisation

Entity Relationship Diagrams

Capturing, Selecting, Managing & Exchanging Data

SQL

Referential Integrity

Transaction Processing

3.3 Networks

Networks

TCP/IP

Domain Name System (DNS)

LAN & WAN

Packet & Circuit Switching

Network Security & Threats

Network Hardware