Assigned:
Sun 01 Jun 2008
Proposals Due:
Tue 10 Jun 2008
Presentations:
Wed/Thu 18/19 Jun 2008
Your task is to investigate an area of Information Retrieval research that you find interesting. You will read 2-5 research papers from (mostly) refereed publications in order to get a sense of what has been done. You may end up covering fewer research papers. Your in-class presentation should run approximately 15 minutes and should highlight the interesting or exciting parts of the work you explored.
You must submit a one-paragraph proposal describing the topic you intend to cover no later than Tue 10 Jun 2008. I strongly encourage you to see me during office hours to discuss your presentation as well.An excellent presentation will make it clear that you have looked at the few papers in sufficient depth to have noticed something interesting and intriguing. The summary of the work will be succint and demonstrate that you've thought about it sufficiently to distill it to its essence. Slides (if used) for the presentation will be well executed and easy to read. The presentation itself will be energetic and fun (this will not count as heavily as it might since not everyone is comfortable--let alone energetic and fun--in front of an audience). The audience should be left anxious to read your paper(s).
More toughts on the content of the presentation and paper are listed below.
You should be using primarily refereed papers (e.g., conferences and journals). Here are some useful sources and how to get ahold of them:
Proceedings of the SIGIR , CIKM and ECIR conferences.
ACM Transactions on Information Systems journal.
Journal of the American Society for Information Science and Technology.
Information Processing and Management.
You may get some of your information from the TREC
proceedings. However, TREC proceedings are not refereed and often are
rather sparse in the details presented. If you use a TREC paper, you should
find some refereed version of the results to confirm that what was presented
is accurate. Here is the TREC homepage, off of
which you can find the proceedings.
Many papers are available via
the ACM Digital
Library, Google Scholar,
and CiteSeer.
Here are some topics that could make good papers, roughly grouped into affinity areas. Some have been discussed in class, meaning that you'd have a better starting point. Others would be new to you if you don't have any additional source of information. You should not feel entirely constrained by this list, though most people will end up choosing from it.
Evaluation
Techniques for finding relevant documents more quickly using the pooled approach
Comparison of evaluation measures, their stability, how they scale
The TREC robust track (trying to get rid of or recognize poorly-performing queries)
Question answering
Methods for finding passages that might contain an answer
Description of some of the better QA systems
Dialogue in question answering
Other sources of data
Retrieving spoken documents (speech recognizer output) (an old TREC track)
Retrieving documents (an old TREC track)
Web retrieval (a TREC track)
Terabyte-scale retrieval (a TREC track)
Genomics retrieval (a TREC track)
Multimedia indexing and retrieval
Direct retrieval of images and/or video
Retrieval via surrounding or descriptive text
Retrieval via image/video annotation
Summarization
Summarizing a single document
Summarizing multiple documents
Headline-type summaries
Summarizing in other languages
Cross-language and multi-lingual retrieval
Research coming out of CLEF (European Cross-Language Evaluation Forum)
Interactive CLIR
Details on some techniques
Sparse language issues (recent special issues in ACM Transactions on Asian Language Information Processing)
Other interesting stuff not touched on in class
Semantic Web
Human interaction issues
Structured documents (XML)
String search algorithms
More in-depth look at some aspect of some topic from class
Topics are first-come, first-served. Two (or more) people can have the same topic only if they specify in advance how they will be specializing their presentations.
The goal of the presentation is to find and talk
about something that is intriguing. It could be something that runs
counter to something said in class, or that pushes an idea from class in an
interesting way. It could be something that was never mentioned in class,
but that is pretty cool and slick. It could be an outrageous claim that,
now that you're most of the way through the course, you don't believe. It
could be an open problem that you think would be exciting to tackle.
Remember that this is something you think is intriguing. Find some
way to make it clear that it is intriguing, so your audience
understands why you picked it. What makes it exciting?
You have about 15 minutes for a presentation. In that time, you'll
need to provide just enough background for your tidbit to make sense,
and to present your tidbit. A good rule of thumb is 2 to 3 minutes
per slide, so you shouldn't expect to use many more than five to seven slides
to fit within the time available. You should, of course, practice your presentation
to ensure that it is roughly 15 minutes.
DO NOT PLAGIARIZE. If you copy any text from any other source, regardless of whether the source is one that you used, of whether you include it in your bibliography, of whether it is published, of whether it is readily available on the Web, of anything--if you copy any such text, you must put it in quotation marks and/or indent it and indicate exactly where it came from, including a complete citation and a page number.