Return to basic course information.
Due: Tuesday, 10 December 2019, 11:59 p.m.
For the course project in CS6200, you will form teams of two to four people. The division of labor in the group will be described below.
If you are in IS4200, please contact me as soon as possible to discuss a project.
Although information retrieval systems perform well on many tasks, there is still plenty of room for new systems and applications. Your job will be to take the first steps in defining and evaluating a new task. The purpose of this project is thus to emphasize the centrality of problem definition and evaluation to information retrieval.
What kinds of tasks might you consider? In class, we have mostly discussed ad-hoc retrieval: the ranking of documents in response to user queries previously unseen by the search engine. For this project, you may consider either ad-hoc retrieval with paticular types of queries that current search engines do not seem to handle well, or other search tasks that involve other output modalities. These outputs might include document summaries, structured data, or clusters. It would be preferable if there were to be more than one correct answer for a given query. In other words, put more of your emphasis on information retrieval than on, e.g., question answering or summarization.
Here are some examples of tasks you might investigate:
The core of this project will be creating an evaluation set for the proposed task. Procedurally, creating the evaluation corpus will proceed as follows:
The evaluation set will therefore consist of a number of records, one for each query. These records are conventionally called topics in IR evaluation. Each topic contains:
On the basis of these relevance judgments, you should be able to estimate human performance on your chosen task. You should also evaluate a baseline model, which will give you an idea of how much progress still needs to be made on the task. These baseline models do not need to be complex. You should evaluate baseline models from the evaluation set alone, without indexing and searching an entire collection. Instead, you should evaluate the baseline model on a reranking task: apply the model to the query and each candidate result in the evaluation set in turn. Then rank the candidate results by the model's score and evaluate this ranking by comparison to human judgements.
In your final submission, please include:
Implement a more specialized model to solve your task. Evaluate its performance compared to the baseline and to human performance.