Search Engine Indexing (OCR A Level Computer Science): Revision Note

Exam code: H446

Written by: Jamie Wood

Reviewed by: James Woodhouse

Updated on 30 June 2025

Search Engine Indexing

How do search engines work?

Search engines work in several stages:
- Crawling - think of this as gathering all of the books within a library
- Indexing - think of this as reading the books and making a structured list of the information within the books
- Ranking - think of this as recommending books to the reader

Crawling

Web pages are discovered by search engines through software programs called crawlers (or spiders, bots, or robots)
Crawlers follow links from one webpage to another, systematically visiting pages on the web
They start from a set of seed URLs and visit other pages linked from those URLs
Website crawlers follow rules and guidelines established by website owners, using mechanisms like the robots.txt file. These guidelines direct crawlers on which areas of a website to explore or avoid, respecting website preferences and ensuring privacy
Once a crawler reaches a webpage, it fetches the HTML content of that page
The crawler examines the HTML structure and retrieves information, such as text content, headings, links, and metadata
To understand the structure of the webpage, the HTML that was retrieved is broken down into individual components
This process involves identifying elements, tags, and attributes that hold valuable information like titles and headings

Indexing

The data extracted from the webpage is indexed, which involves storing the collected information in a structured manner within a search engine's database
Each word in the document is included in the page's index as an entry, along with the word’s position on the page
The index allows for quick retrieval and ranking of relevant web pages in response to user queries

Ranking

When a user enters a query, the search engine searches the index for matching pages and returns the results they believe are the highest quality and most relevant to the user's query

Benefits of search engine crawling & indexing

The process of search engine indexing is essential for search engines to collect, examine and arrange online content
It involves collecting and storing information from web pages in a searchable index
There are many reasons for search engine crawling and indexing to happen:
- Improved search results
- Efficient retrieval
- Ranking and relevance
- Freshness and updates

Improved search results

Indexing webpages means search engines can:
- Provide users with relevant and up-to-date search results
- Match user searches with content which increases the chances of accurate and valuable results
This means the user is more likely to find what they're looking for quickly, ideally on the first page of search results, without having to go to additional pages

Efficient retrieval

Indexing enables efficient retrieval of information
Search engines don't need to scan the entire web for every search query. They can just search their indexed data to produce search results quickly

Ranking & relevance

Indexing enables search engines to assess the relevance and quality of web pages
Search result rankings are determined by various ranking algorithms that analyse indexed data. These algorithms consider factors such as keyword relevance, backlinks, and user engagement

Freshness & updates

Search engine crawlers periodically revisit indexed web pages to detect updates and changes
This process guarantees that the search results display the latest content that is currently accessible on the Internet
If a webpage has been updated and not re-crawled, the page may no longer be relevant for the user's search

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Test yourself

Did this page help you?

Previous:Client Server & Peer to PeerNext:PageRank Algorithm

Search Engine Indexing (OCR A Level Computer Science): Revision Note

Search Engine Indexing

How do search engines work?

Crawling

Indexing

Ranking

Benefits of search engine crawling & indexing

Improved search results

Efficient retrieval

Ranking & relevance

Freshness & updates

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

1. The Characteristics of Contemporary Processors, Input, Output & Storage Devices

1.1 Structure & Function of the Processor

Components of the CPU

Fetch-Decode-Execute Cycle

CPU Performance

Pipelining

Von Neumann & Harvard Architecture

1.2 Types of Processor

RISC vs CISC

Graphics Processing Unit (GPU)

Multicore & Parallel Processors

1.3 Input, Output & Storage

Input, Output & Storage

RAM & ROM

2. Software & Software Development

2.1 Systems Software

Operating Systems

Memory Management

System Interrupts

Scheduling

Types of Operating System

BIOS

Device Drivers

Virtual Machines

2.2 Applications Generation

Application Software

Utility Software

Open Source & Closed Source Software

Translators

Stages of Compilation

Libraries, Linkers & Loaders

2.3 Software Development

Waterfall Lifecycle

Agile Programming

Spiral Model

Rapid Application Development (RAD)

Comparing Software Development Models

2.4 Types of Programming Language

Programming Paradigms

Procedural Programming

Assembly Language & Little Man Computer

Modes of Addressing

2.5 Object Oriented Languages

Classes (OOP)

Objects (OOP)

Methods (OOP)

Attributes (OOP)

Inheritance (OOP)

Encapsulation (OOP)

Polymorphism (OOP)

3. Exchanging Data

3.1 Compression, Encryption & Hashing

Lossy & Lossless Compression

Run Length Encoding & Dictionary Coding

Symmetric vs Asymmetric Encryption

Hashing

3.2 Databases

Relational Databases

Data Normalisation

Entity Relationship Diagrams

Capturing, Selecting, Managing & Exchanging Data

SQL

Referential Integrity

Transaction Processing

3.3 Networks

Networks

TCP/IP