Created: November8 2006
Last modified:
This project requires that you implement metasearch on the runs you have outputted on Project1.
Metasearch is the process of combining multiple return lists with respect to the same query. There are many popular algorithms for metasearch, including CombMNZ, CombSum, BordaCount, Condorcet, Online Allocation , Sampling, LC, Bayes etc.. The purpose is to output a ranked list of results that is at least as good as the list of any underlying system used by the algorthm. Here is the lecture given in class.
In Project1, you should have outputted 16 runs, each containing results for all queries on a given database and a retrieval formula. In Project2 by “same query” we mean query represented by the same number, although the actual queries might be slightly different from run to run because the stopwording and stemming .
First fix a database, use d=3 (stemmed,
stopworded).
Then consider the 4 runs you have for this database corresponding to the
4 retrieval methods required on Project1. Rank these 4 runs by the
MAP(mean average precision) evaluation; will call them run1(best MAP),
run2, run3, run4(worst MAP). For a given metasearch algorithm you will do
two metasearch runs:
- metasearch over run1 and run2
- metasearch over run1,run2,run3 and run4.
It is important to mention that metasearch works on query by query basis. Therefore your metasarch implementation should proceed sequential on queries: for each query, read the underlying runs on that query, use metasearch to come up with a new ranking of those results (top1000) and then output the same format as before
query-number Q0 document-id rank score Exp
where score and rank are the computed by
metasearch.
Then move to the next query and so on, keep adding the results to the same
metasearch-output file. You can evaluate the metasearch output using
trec_eval as before.
You will implement 2 metasearch algorithms:
CombSum. consult the lecture for details
Your own designed
metasearch algorithm. It does not have to be orginal. You can take a look
at any metasearch algorithm available; also you can be as creative as you
like. The sole purpose is to obtain good results on underlying runs1-4
above mentioned. It can use score or ranks or any combination of them.
Restriction : the only restriction is that the algorithm has to be
different than CombSum, CombMNZ etc.
In total there would be 4 runs (2 algorithms x 2 sets of underlying systems).
Provide a short description of what you did and some analysis of what you learned. This writeup should include at least the following information:
Uninterpolated mean average precision numbers for all 4 runs.
Precision at 10 documents retrieved for all 4 runs.
An analysis of the benefits or disadvantages of metaserch.
Describe in enough details your metasearch algorithm so one can actually replicate it. Also add in justification for your choices.
Feel free to try other runs to better explore the
issues.
Hand in a printed copy of the report. The report should not be much longer
than a dozen pages (if that long). In particular, do not include trec_eval
output directly; instead, select the interesting information and put it in a
nicely formatted table.