IS1320 Sp03 Midterm review #1 - Prof. Futrelle

Midterm to be given on Thursday 8 May - Closed book/notes

This review document may be updated or augmented by another before the Midterm is given. But it is a good start, if not the final word.

  1. Google: There will be a general question concerning Google, which won't be too hard if you've read through Brin and Page's "The Anatomy of a Large-Scale Hypertextual (Web) Search Engine". This paper is available in literally hundreds, if not thousands of places on the web, so you will have no problem whatsoever in finding a copy to print and read. Google for: brin page anatomy.

  2. The Page Rank algorithm: The notes here follow the web page, "Link, Links Analysis and Page Rank Algorithm" at You should print out that page and study it carefully in preparation for the Midterm. Given a set of pages and pointers in the form "U points to X and Y", etc. you will be asked to draw the corresponding diagram. Given the general formula and the values of E and F in the equation:
    I(Q) = E + F * (I(P1)/Op1 + ... + I(Pm)/OPm)
    you should be able to estimate by inspection, which page will be ranked highest, and which lowest and demonstrate this by iterating the equation once. That is, you will need to do the numerical computation. It's so simple that you shouldn't need a calculator, but you can use one if you like. (Points will be deducted for reporting results to too many decimal places of precision.)

  3. XML: You should understand the basics of XML: That there is a schema that describes the structure of a class of XML documents. That OO classes can be generated that are in one-to-one correspondence with the XML Schema, e.g., using JAXB. The type of question you will be given is that you will be given an example XML source document and then you should be able to explain that OO classes corresponding to the XML elements can be created with the corresponding set() and get() methods. for example, a document snippet such as:
         <PersonInfo gender="male">
    could lead to operations such as info.getName(), info.setAge(39) and info.getGender(), where info is an instance of a PersonInfo class created to correspond to the XML Schema. You should also be familiar with the concepts of marshalling, unmarshalling and validation. A basic synopsis of these terms and concepts is at, which you should also read, though you might find some equivalent information somewhere that you prefer.

  4. Inverted files/index and using for searching: This is covered in Sec. 8.2 of the text. I will give you a few tiny documents and ask you to construct the inverted index and explain how it would be used in processing a query. [more on this topic shortly]

Go to IS1320 home page.

Return to Prof. Futrelle's home page