ISU535 09X2 Project Phase III

Created: Sun 26 Jul 2009
Last modified: 

Assigned: Mon 27 July 2009
Due: Thu 13 August 2009 (last day of classes;hard deadline)


Phase III (PageRank)

In this project, you will compute PageRank on a collection of 183,811 web documents. Consider the version of PageRank described in class and in HW03. PageRank can be computed iteratively as show in the following pseudocode:
// M(p) is a list of pages that link to page p
// L(q) is the number of out-links from page q
// d is the PageRank damping/teleportation factor; use d = 0.85 as is typical

foreach page p
  PR(p) = 1/N                          /* initial value */

while PageRank has not converged do
  foreach page p
    newPR(p) = (1-d)/N                 /* teleportation value */
    foreach page q in M(p)             /* pages pointing to p */
      newPR(p) += d*PR(q)/L(q)         /* add share of PageRank from in-links */
  foreach page p
    PR(p) = newPR(p)

return PR
In order to facilitate the computation of PageRank using the above pseudocode, one would ideally have access to an in-link respresentation of the web graph, i.e., for each page p, a list of the pages q that link to p. For example, given the graph shown in Problem 5 from HW03, one such representation would be as follows
A D E F
B A F
C A B D
D B C
E B C D F
F A B D
where the first line indicates that page A is linked from pages D, E, and F, and so on. Note that unlike the example in HW03, in a real web graph, not every page will have in-links, nor will every page have out-links.

The Project

What to Submit

In addition to the items mentioned above, you should hand in a copy of your code, which should hopefully be relatively short. (The pseudocode for the iterative PageRank algorithm is 10 lines long, as shown above.)

Grading

This project is worth a maximum of 150 points.

Switch to:


ekanou@ccs.neu.edu