CSG339 06F, Information Retrieval
Project two

Created: November8 2006
Last modified:

Assigned: Wed 15 Nov 2006
Due: Wed 6 Dec 2006

This project requires that you implement metasearch on the runs you have outputted on Project1.

Metasearch

Metasearch is the process of combining multiple return lists with respect to the same query. There are many popular algorithms for metasearch, including CombMNZ, CombSum, BordaCount, Condorcet, Online Allocation , Sampling, LC, Bayes etc.. The purpose is to output a ranked list of results that is at least as good as the list of any underlying system used by the algorthm. Here is the lecture given in class.

The runs

In Project1, you should have outputted 16 runs, each containing results for all queries on a given database and a retrieval formula. In Project2 by “same query” we mean query represented by the same number, although the actual queries might be slightly different from run to run because the stopwording and stemming .

The metasearch systems you build

First fix a database, use d=3 (stemmed, stopworded).
Then consider the 4 runs you have for this database corresponding to the 4 retrieval methods required on Project1. Rank these 4 runs by the MAP(mean average precision) evaluation; will call them run1(best MAP), run2, run3, run4(worst MAP). For a given metasearch algorithm you will do two metasearch runs:
- metasearch over run1 and run2
- metasearch over run1,run2,run3 and run4.

It is important to mention that metasearch works on query by query basis. Therefore your metasarch implementation should proceed sequential on queries: for each query, read the underlying runs on that query, use metasearch to come up with a new ranking of those results (top1000) and then output the same format as before

query-number Q0 document-id rank score Exp

where score and rank are the computed by metasearch.
Then move to the next query and so on, keep adding the results to the same metasearch-output file. You can evaluate the metasearch output using trec_eval as before.

The metasearch algorithms

You will implement 2 metasearch algorithms:

CombSum. consult the lecture for details
Your own designed metasearch algorithm. It does not have to be orginal. You can take a look at any metasearch algorithm available; also you can be as creative as you like. The sole purpose is to obtain good results on underlying runs1-4 above mentioned. It can use score or ranks or any combination of them.
Restriction : the only restriction is that the algorithm has to be different than CombSum, CombMNZ etc.

In total there would be 4 runs (2 algorithms x 2 sets of underlying systems).

What to hand in

Provide a short description of what you did and some analysis of what you learned. This writeup should include at least the following information:

Uninterpolated mean average precision numbers for all 4 runs.
Precision at 10 documents retrieved for all 4 runs.
An analysis of the benefits or disadvantages of metaserch.
Describe in enough details your metasearch algorithm so one can actually replicate it. Also add in justification for your choices.

Feel free to try other runs to better explore the issues.

Hand in a printed copy of the report. The report should not be much longer than a dozen pages (if that long). In particular, do not include trec_eval output directly; instead, select the interesting information and put it in a nicely formatted table.

CSG339 06F, Information Retrieval Project two