ISU535 Information Retrieval - Spring 2004 - Prof. Futrelle

Updated 16 February

The preferred type of project is a programming project done in Java. You will need to discuss any different plan you would like to pursue if it is not Java programming. You may also want to discuss your Java programming project plans with me to be sure your project is appropriate in topic and scope for this course.

Your twin goals will be content and presentation. That is, your project should have substantial content and secondly, it must be presented well. Both are required. Java code should be well-documented and must run on our Sun Solaris systems. That is the platform I'll use for the grading. You may develop your code where and how you wish, but make sure it compiles and runs on the Suns.

A project report is required even if you are doing a programming project. Your report should be a minimum of 500 words and include a minimum of four references. It should explain what your code is about. Just turning in a program is inadequate. Absence of a proper report will result in the automatic loss of 30% of the project grade.

The reports for a non-programming project must be much more substantial. They should be a minimum of 2,000 words and include at least ten references, all properly formatted. (Start writing now!) All references should be discussed and referred to in the text. (I will be on the lookout for plagiarism and am quite good at detecting it. Plagiarism is a serious offense and could lead to a deduction of one or two letter grades in the course or worse. So don't go there.)

You may work with a partner if you are doing a programming project. You must be very clear about who did what. If your partner is not pulling his/her weight, speak to me immediately (not developing/testing code, not showing up for meetings, not answering email or phone messages promptly, etc.).

Some examples of programming projects

Some of the examples below involve Google Hacks. The O'Reilly book about Google Hacks by Calishain and Dornfest is on Reserve for this course. The book site is here. Your primary task for the Hacks would be to do a variation on one of the Hacks and to rewrite it in Java, since practically all the Hacks are in Perl and are all online, so you wouldn't be doing much if you simply ran them. Moving them to Java will force you to understand them and also move them to a language that has strong support for building useful GUIs through Swing. If you can add some sort of GUI capability to one or more of the Hacks, more power to you! What many of the Hacks do is to generate a web page on the client side that holds the results and shows them in a browser. That's a major reason the Hacks are so handy to work with. Of course a serious GUI is far more powerful than trying to control the appearance of a browser and make it pretend to be a nice GUI application.

  1. Google hacking for proximity searching.
  2. Google hacking to reorder results by various criteria.
  3. Google hack that gets some images, decompresses them (using Java 2D) and gives back some color information for the bottom, middle and top of the image. (Hard) Or, you could skip the hack and focus on analyzing and classifying some images. (Not as hard)
  4. Google Hack Google News to see how story popularity rises and falls over the course of a week or two. (Need to get this one running pretty early.)
  5. Use regular expressions to do simple stemming, a subset of the Porter's algorithm in the appendix of our textbook. Here's a tutorial on Java regular expressions.
  6. Build inverted indexes and a way to find words and phrases using them.
  7. Build a simple query tool and apply it to a small collection of small documents that you have ranked manually, to compute Recall and Precision for various queries.
  8. Use the Java HttpURLConnection class to connect to some non-Google sites to download some information. Then do some simple analysis/filtering of what the system returns.
  9. Devise a simple cluster algorithm to cluster documents based on their text (word) contents. I can give you specific guidance on this. See an example here.

For non-programming projects, bring your ideas to me and we'll discuss them.

Go to ISU535 home page. or RPF's Teaching Gateway or homepage