From Computer Science Wiki
Web Science[1]
Indexing is the process by which search engines organise information before a search to enable super-fast responses to queries.[2]

A search index is a body of structured data that a search engine refers to when looking for results that are relevant to a specific query. Indexes are a critical piece of any search system, since they must be tailored to the specific information retrieval method of the search engine’s algorithm. In this manner, the algorithm and the index are inextricably linked to one another. Index can also be used as a verb (indexing), referring to the process of collecting unstructured website data in a structured format that is tailored for the search engine algorithm.

One way to think about indices is to consider the following analogy between a search infrastructure and an office filing system. Imagine you hand an intern a stack of thousands of pieces of paper (documents) and tell them to organize these pieces of paper in a filing cabinet (index) to help the company find information more efficiently. The intern will first have to sort through the papers and get a sense of all the information contained within them, then they will have to decide on a system for arranging them in the filing cabinet, then finally they’ll need to decide what is the most effective manner for searching through and selecting from the files once they are in the cabinet. In this example, the process of organizing and filing the papers corresponds to the process of indexing website content, and the method for searching across these organized files and finding those that are most relevant corresponds to the search algorithm.[3]

Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process in the context of search engines designed to find web pages on the Internet is web indexing.[4]

Search engines index websites in order to respond to search queries with relevant information as quickly as possible. For this reason, it stores information about indexed web pages, e.g. keyword, title or descriptions, in a database. This way search engines can quickly identify pages relevant to a search query.

Indexing has the additional purpose of giving a page a certain weight, as described in the search algorithms. This way search results can be ranked, after being indexed.[5]

When you look at a search results page, you are looking at the results of your search query against the web index for that search engine.

This image used with permission from Stone Temple [6]


Knowledge Graph[edit]


These standards are used from the IB Computer Science Subject Guide[7]

  • Outline the purpose of web-indexing in search engines.