Created: June 10 2005
Last modified:
Make sure you check the syllabus for the due date.
This project requires that you implement metasearch on the runs you have outputted on Project1.
Metasearch is the process of combining multiple return lists with respect to the same query. There are many popular algorithms for metasearch, including CombMNZ, CombSum, BordaCount, Condorcet, Online Allocation , Sampling, LC, Bayes etc.. The purpose is to output a ranked list of results that is at least as good as the list of any underlying system used by the algorthm. Here is the lecture given in class.
In Project1, you should have outputted 16 runs, each containing results for all queries on a given database and a retrieval formula. In Project2 by “same query” we mean query represented by the same number, although the actual queries might be slightly different from run to run because the stopwording and stemming .
First fix a database, use d=3
(stemmed, stopworded).
Then consider the 4 runs you have for this
database corresponding to the 4 retrieval methods required on
Project1. Rank these 4 runs by the MAP(mean average precision)
evaluation; will call them run1(best MAP), run2, run3, run4(worst
MAP). For a given metasearch algorithm you will do two metasearch
runs:
- metasearch over run1 and run2
- metasearch over
run1,run2,run3 and run4.
It is important to mention that metasearch works on query by query basis. Therefore your metasarch implementation should proceed sequential on queries: for each query, read the underlying runs on that query, use metasearch to come up with a new ranking of those results (top1000) and then output the same format as before
query-number Q0 document-id rank score Exp
where score and rank are the computed by
metasearch.
Then move to the next query and so on, keep adding the
results to the same metasearch-output file. You can evaluate the
metasearch output using trec_eval as before.
You will implement 2 metasearch algorithms:
CombSum. consult the lecture for details
Your own
designed metasearch algorithm. It does not have to be orginal. You
can take a look at any metasearch algorithm available; also you can
be as creative as you like. The sole purpose is to obtain good
results on underlying runs1-4 above mentioned. It can use score or
ranks or any combination of them.
Restriction : the only
restriction is that the algorithm has to be different than CombSum,
CombMNZ etc.
In total there would be 4 runs (2 algorithms x 2 sets of underlying systems).
Provide a short description of what you did and some analysis of what you learned. This writeup should include at least the following information:
Uninterpolated mean average precision numbers for all 4 runs.
Precision at 10 documents retrieved for all 4 runs.
An analysis of the benefits or disadvantages of metaserch.
Describe in enough details your metasearch algorithm so one can actually replicate it. Also add in justification for your choices.
Feel free to try other runs to better
explore the issues.
Hand in an electronic copy of the report.
The report should not be much longer than a dozen pages (if that
long). In particular, do not include trec_eval output directly;
instead, select the interesting information and put it in a nicely
formatted table.
This project is worth a maximum of 75 points.