IS4200/CS6200: Information Retrieval

Project 3

Return to basic course information.

Assigned: Thursday, 15 November 2012
Due: Monday, 10 December 2012, 6pm

New! Extra credit points available.


In this project, you will replicate the functionality of the Lemur index used in Project 2, and in conjuction with the code you created implementing various retrieval functions for Project 2, you will have created a fully functioning search engine.

The Project

New: Extra Credit

For a maximum of 50 extra points, consider the following:

Many modern search engines end up indexing even stop words. Disk is cheap! But what are the tradeoffs? For extra credit, analyze the empirical time and space complexity of including stopword information in the index:

Note that you should still stem the document and query terms. Points will be assigned for a clear description of the approach and presentation of the results.

What to Submit

The main assignment is worth 150 points. The extra credit portion is worth at most 50 extra points (for a maximum total of 200).