Search engines, web crawling and web graph

From Computer Science Wiki
Web Science[1]

Please reference Principles of searching algorithms used by search engines as you reflect on the learning standard, below.

Once we have a graph which describes the relationship between nodes, we can use that data to rank pages based on the number of links. PageRank works this way, but is much more complicated.

Of course, computers don't use a visual directed graph. They probably use an adjacency matrix to calculate related nodes and number of relations. Please watch this excellent video to better understand adjacency matrix. In the linked video, the commentator rightly critiques the big O running time of an adjacency matrix. However, for the purpose of this article and concept, you should understand how a computer might represent a graph.

Search engines and web crawling use the web graph to access and index information on the World Wide Web. The web graph is a network of interconnected web pages and links that make up the World Wide Web, and search engines and web crawlers use this network to discover and access new web pages and to keep track of the relationships between different pages.

Search engines use algorithms to search the web graph for web pages that match a given search query. They do this by starting with a seed set of web pages (such as the pages in their existing index) and following the links between pages to discover new pages. As they crawl the web graph, they index the content of the pages they discover and use this information to create a searchable index of the web.

Web crawlers are automated programs that are used to discover and index new web pages. They work by starting with a seed set of web pages and following the links between pages to discover new pages. As they crawl the web graph, they index the content of the pages they discover and update their index with this information.

In summary, search engines and web crawling use the web graph to access and index information on the World Wide Web. They do this by following the links between web pages to discover new pages and index their content, creating a searchable index of the web.


Do you understand this?[edit]

From the IB: Students should be aware of the Page Rank algorithm and explain how it works.


Standards[edit]

These standards are used from the IB Computer Science Subject Guide[2]

  • Explain that search engines and web crawling use the web graph to access information.



References[edit]

  1. http://www.flaticon.com/
  2. IB Diploma Programme Computer science guide (first examinations 2014). Cardiff, Wales, United Kingdom: International Baccalaureate Organization. January 2012.