Assignment Data Mining

Submitted by: Submitted by

Views: 10

Words: 1065

Pages: 5

Category: Science and Technology

Date Submitted: 07/19/2016 05:55 PM

Report This Essay

1. Google uses its proprietary algorithm called "PageRank" to measure the standing of a web page. How does PageRank work and how is data mining used in this technique?

PageRank measures the prestige of a web page or website. In short, it may separate the informative useful page or site from an unauthoritative.

It works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites or are likely to receive more links from other websites.

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value. PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search.

The key rule to understand is that it is a combination of variables that determine how well the site performs in Google. These are the most important variables to worry about:

a. Incoming links to your site.

b. The relevancy (to your site’s theme) of the pages linking to your site and the PageRank of these pages.

c. The keywords that other sites use to link to your site.

d. The keywords on your website in particular in places like page titles and headlines.

The PageRank algorithm outputs a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. PageRank can be calculated for collections of documents of any size. It is assumed in several research papers that the distribution is evenly divided among all documents in the collection at the beginning of the computational process. The PageRank computations require several passes, called "iterations", through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value.

A probability is expressed as a numeric value between 0 and 1. A 0.5 probability is commonly...