Stages of Compilation (OCR A Level Computer Science)
Revision Note
Written by: Callum Davies
Reviewed by: James Woodhouse
Stages of Compilation
What is Compilation?
Compilation is a process that translates a program written in a high-level programming language into machine code
Only machine code can be executed by a computer
There are four stages involved in this process:
Lexical Analysis
Syntax Analysis
Code Generation
Optimisation
Lexical Analysis
Lexical analysis means studying the words or vocabulary of a language
This stage involves identifying lexical 'tokens' in the code
Tokens represent small meaningful units in the programming language, such as:
Keywords
var, const, function, for, while, if
Identifiers
Variable names, function names
Operators
'+', '++', '-', '*'
Separators
',' ';', '{', '}', '(', ')'
During this stage, unnecessary elements like comments and whitespace are ignored
For example, if the following code is being compiled:
var x = function(x,y) {
if(x>2) {
return x*y;
}
return x+y;
}
The result of lexical analysis is a token table
| Token | Type |
---|---|---|
1 | var | Keyword |
2 | x | Identifier |
3 | = | Operator |
4 | function | Keyword |
5 | ( | Separator |
6 | x | Identifier |
7 | , | Separators |
8 | y | Identifier |
9 | ) | Separator |
10 | { | Separator |
11 | return | Keyword |
12 | x | Identifier |
13 | * | Operator |
14 | y | Identifier |
15 | ; | Separator |
16 | } | Separator |
Syntax Analysis
Now that tokens have been identified, syntax analysis makes sure they all adhere to the syntax rules of the programming language
A symbol, e.g. '$' could be a valid token but not a valid character according to particular programming languages
The dollar symbol would be flagged as breaking the syntax rules
Other syntax errors programmers commonly make include mismatched parentheses or missing semicolons
If the code passes the syntax analysis, the compiler can create an Abstract Syntax Tree (AST)
An AST is a graph-based representation of the code being compiled
An AST is an efficient way to represent the code for the next step
Example abstract syntax tree
For the same code as above, the following abstract syntax tree can be created
Abstract syntax tree
Code Generation
This step takes the AST and traverses it to generate object code that can be executed by the computer
Optimisation
This step modifies the code to make it more efficient without changing its functionality
This is important to attempt because it reduces the memory required to run the code, which leads to faster execution
A common optimisation action is removing duplicate code
If an 'add' function is written twice in the source code, a sophisticated compiler will notice this and include it only once in the object code
Worked Example
Imogen is writing application software for her company and is ready to compile it.
const celsius = (fahrenheit) => {
return (5/9) * (fahrenheit-32);
}
Referring to the example above, explain what happens during Lexical Analysis.
[2]
How to answer this question:
Recall the purpose of lexical analysis and what it aims to produce
Recall the different types of tokens that can be identified in the code
Use examples from the code block to write your answer
Answer:
Answer that gets full marks:
'Lexical analysis' breaks the code into tokens, ignoring whitespace and comments. Tokens are identified by their type:
keyword: 'return'
operator: '*', '-'
identifier: celsius, fahrenheit
delimiter: ';', '(', ')' etc
When all tokens have been identified, a token table is produced for the next step in the compilation.
Acceptable answers you could have given instead:
The compiler will look at all the lexical tokens in the code e.g. 'fahrenheit' is an identifier, 'return' is a keyword. All the tokens are placed into a tokens table for the next step.
Worked Example
State the name of the stage of compilation that directly follows Lexical Analysis.
[1]
Answer:
Answer that gets full marks:
Syntax analysis.
Last updated:
You've read 0 of your 5 free revision notes this week
Sign up now. It’s free!
Did this page help you?