Principles of searching algorithms used by search engines: Difference between revisions

From Computer Science Wiki
No edit summary
No edit summary
Line 7: Line 7:


PageRank (a search algorithm used by google) works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.<ref>https://en.wikipedia.org/wiki/PageRank</ref>
PageRank (a search algorithm used by google) works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.<ref>https://en.wikipedia.org/wiki/PageRank</ref>
<br />
The image below (click to enlarge) is a graphical representation of page rank. Note cirlce B is huge because many other pages link to it. Please look at circle C. Why is C so large with so few links (answer below).


[[File:400px-PageRanks-Example.png|200px]]
[[File:400px-PageRanks-Example.png|400px]]


HITS (Hyperlink-Induced Topic Search) assigns two scores for each page: its authority, which estimates the value of the content of the page, and its hub value, which estimates the value of its links to other pages.
HITS (Hyperlink-Induced Topic Search) assigns two scores for each page: its authority, which estimates the value of the content of the page, and its hub value, which estimates the value of its links to other pages.
Line 16: Line 18:




 
Circle C is larger because it is linked to from an authoritative source. Compare this to circle A, which isn't linked from an authoritative source.
== Standards ==
== Standards ==



Revision as of 09:14, 5 January 2018

Web Science[1]
Most popular search algorithms establish a "page rank" based on how many other pages link to it. Search algorithms weight the links between pages. A page which has 10 links to it has a higher weight than a page which has 2 links to it. Not all links are the same. 

Note: from the IB: Students will be expected to understand only the principles of the PageRank and HITS algorithms

PageRank (a search algorithm used by google) works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.[2]
The image below (click to enlarge) is a graphical representation of page rank. Note cirlce B is huge because many other pages link to it. Please look at circle C. Why is C so large with so few links (answer below).

400px-PageRanks-Example.png

HITS (Hyperlink-Induced Topic Search) assigns two scores for each page: its authority, which estimates the value of the content of the page, and its hub value, which estimates the value of its links to other pages.


Do you understand this?[edit]

Circle C is larger because it is linked to from an authoritative source. Compare this to circle A, which isn't linked from an authoritative source.

Standards[edit]

  • Outline the principles of searching algorithms used by search engines.

References[edit]