Assigned:
Tue 30 Jun 2009
Due:
Wed 8 Jul 2009
Problem 1 (25 points)
Courtesy James Allan
The purpose of this exercise is to gain some "hands-on" experience in the process of evaluating information retrieval systems. Run the following two queries on Google (http://google.com and on Bing (http://bing.com) (do not run the explanation; just run the query). You will judge 10 documents for relevance to the query. The explanation will help you decide whether or not something is relevant.When judging pages for relevance, please note the following:
Use the following format (one result per line)
G-or-B query-number doc's-rank R-or-N doc's-title
G-or-B query-number doc's-rank R-or-N doc's-title
. . . . . .
where G-or-B indicates whether you ran this on Google or Bing, query-number is 1 or 2 from above, doc's-rank is the number from 1 to 20 (or more, if appropriate) of this document's rank, R-or-N is R if the page is relevant and N otherwise, and doc's-title is the title of the document (to help us verify that results make sense). Since you are judging 10 pages from each of two queries on each of two search engines, the file you submit should have 40 lines.
Problem 2 (25 points)
CourtesyJames Allan
RRNRNRNNNN RNNNRNNNNR RNNNNNNNRN NNNNNNNNNN RNNNNNNNNN
Based on that list, calculate the following measures:
RNNRNNNRNN NNNRNNNNNN NRNNNNNNRN NNRNNNNRNN NNNNRNNNNR
Repeat parts (A.1), (A.2), (A.3), and (A.4) for the above ranked list. Compare the two ranked lists on the basis of these four metrics that you have computed--i.e., if you were given only these four numbers (Mean Average Precision, Precision at 50% recall, Precision at 33% recall, and R-precision) what can you determine about the relative performance of the two systems in general.
Problem 3 (25 points)
Define 'perfect-retrieval' as the list of ranked documents where (a) all the relevant documents are retrieved, and (b) every relevant document is ranked higher than any non-relevant one.
Consider a typical precision-recall curve starting at A (RECALL=0,PREC=1) and ending at B(RECALL=1,PREC=0)as shown in the plot (1) below.
Problem 5 - Extra Credit (15 points)
In this problem you are asked to implement Average Precision. The input to your program will be two files, (a) the ranked list of documents as returned by a retrieval system, and (b) the qrel file that contains for each query the set of all documents judged as relevant or non-relevant.The results file has the form,
query-number Q0 document-id rank score Exp
where query-number is the number of the query, document-id is the external ID for the retrieved document, and score is the score that the retrieval system creates for that document against that query. Q0 (Q zero) and Exp are constants that are used by some evaluation software. You can download such a file for the READWARE retrieval system submitted to TREC 8 here.
The qrel file has the form,
query-number
0
document-id
relevance
where query-number is the number of the query, document-id is the external ID for the judged documents, 0 is a constant and relevance is the relevance assigned to the document for the particular query; relevance is either 0 (non-relevant) or 1 (relevant). You can download the qrel file for TREC 8 here.
For a given query, say query number 401, first calculate the total number of relevant documents, R, from the qrel file. Then read the results for the particular query from the input.READWARE file, find the corresponding documents for this particular query in the qrel file and the associated relevance score. Note that a document may appear more than once in the qrel file with different relevance score for different queries, so make sure you get the relevance score of the document associated with the correct query. Note also that if a returned document for a certain query in the input.READWARE file does not show up for the same query in the qrel file, then you can consider it as non-relevant and assign to it a relevance score of 0.
Compute the Average Precision of the ranked lists of documents returned by READWARE for each one of the 50 queries and the Mean Average Precision over all 50 queries. Submit the Average Precision values and the Mean Average Precision value you've calculated. Just so you can check the correctness of the output values, the Average Precision values for queries 401 and 402 are 0.0650 and 0.1704, respectively.
DO NOT SUBMIT YOUR CODE however proud of it you are :-)