IS4200/CS6200 09X1 Homework 01

Created: Thu 13 May 2010
Last modified:

Assigned: Thu 13 May 2010
Due: Thu 20 May 2010

Instructions

This assignment is due at the beginning of class on the due date assigned above.
If you collaborated with others, you must write down with whom you worked on the assignment. If this changes from problem to problem, then you should write down this information separately with each problem.

Problems

Problem 1 (25 pts)

For this problem, you will assess the performance of a search engine and compare your assessments to those created by the NIST assessors for the TREC conference. Go to the following URL

http://fiji4.ccs.neu.edu/~zerg/IRclass_judge_interface/

and log in to the judging interface using your last name as your "username" and using your NUID as your "password". You will then be given a fixed topic and a sequence of documents to judge; these documents correspond to the ranked list returned by a specific search engine run on a collection of AP newswire articles using the topic in question as a search query.

For each document, you must decide whether it satisfies the information specified by the topic or not. For those documents that you judge as "relevant" (i.e., satisfying the information need), you should select the sentence or collection of sentences that caused you to judge the document as relevant, and return those sentences in the response box provided. (These sentences need not be contiguous.) Depending on one's interpretation of the information need, it may be possible to argue that a document is either relevant or non-relevant. The important point is to be consistent in your judging: imagine that you had posed this information need, and consistently determine what documents would satisfy that information need for you.

You will have 50 documents to judge, and once you have completed your assessments, you will be given a summary page that contains your relevance asessments (and other information) in standard TREC qrel format. These relevance assessments are given in the same order in which you judged documents, and this ordering corresponds to the ranked order of documents as retrieved by the underlying search engine (top document first and so on).

Given your relevance assessments, compute the following measures:

PC(5)
PC(10)
PC(15)
PC(20)
PC(30)
total number of relevant documents retrieved

Finally, we have the assessments of this search engine, for each query, as computed by the NIST assessors for TREC. These assessments (on a per query basis) are given in the trec_eval.output file.

Find your query in this file, and compare your assessments to the "official" TREC assessments. Discuss and differences or similarities in these assessments.

Problem 2 (25 points)

Courtesy James Allan

A. The following list of R's and N's represents relevant (R) and non-relevant (N) documents in a ranked list of 50 documents. The top of the ranked list is on the left of the list, so that represents the most highly weighted document, the one that the system believes is most likely to be relevant. The list runs across the page to the right. This list shows 10 relevant documents. Assume that there are only 10 relevant documents for this query.

RRNRNRNNNN RNNNRNNNNR RNNNNNNNRN NNNNNNNNNN RNNNNNNNNN

Based on that list, calculate the following measures:

1. Average precision

2. Inerpoloated precision at 50% recall

3. Interpolated precision at 33% recall

4. R-precision

B. Now, Imagine another system retrieves the following ranked list for the same query.

RNNRNNNRNN NNNRNNNNNN NRNNNNNNRN NNRNNNNRNN NNNNRNNNNR

Repeat parts (A.1), (A.2), (A.3), and (A.4) for the above ranked list. Compare the two ranked lists on the basis of these four metrics that you have computed--i.e., if you were given only these four numbers (Mean Average Precision, Precision at 50% recall, Precision at 33% recall, and R-precision) what can you determine about the relative performance of the two systems in general.

C. Plot a recall/precision graph for the above two systems. Generate both an uninter-polated and an interpolated graph (probably as two graphs to make the four plots easier to see). What do the graphs tell you about the system in A and the one in B?

D. Plot ROC curves for the above two systems.

Problem 3 (15 points)

Define 'perfect-retrieval' as the list of ranked documents where

- all the relevant documents are retrieved and

- every relevant document is ranked higher than any non-relevant one.

A. Prove that at a list demonstrates perfect-retrieval if and only if there exists a cutoff=c such that the list has PREC(c)=1 and RECALL(c)=1

B. Consider a particular cutoff c=10

B1. Give example of a list that has PREC(10)=1 and RECALL(10)<1

B2. Give example of a list that has PREC(10)<1 and RECALL(10)=1

C. Prove that a list demonstrates perfect-retrieval if and only if R_PREC=1

D. Prove that a list demonstrates perfect-retrieval if and only if AveragePrecision=1

Problem 4 (15 points)

Consider a typical precision-recall curve starting at A (RECALL=0,PREC=1) and ending at B(RECALL=1,PREC=0)as shown in the plot (1) below.

A. Every point on the precision-recall curve corresponds to a particular rank (or cutoff). Intersect the curve with the main diagonal given by the equation y=x at point C (rp,rp). If the query in question has R relevant documents, at what rank does this intersection point corresponds to?

B. As a consequence of the above, the quantity rp represents a standard retrieval metric. Explain which one and why.

C. As shown in plot (2), we now add the segments [AC] and [CB]. We are interested in the area under the blue precision-recall curve and we can approximate it by the area under the red lines ACB. Compute this approximation area as a function of rp.

D. Explain how the average precision and R-precision measures are thus related.

Problem 5 (10 points)

Courtesy Huahai Yang

Which of the documents in table 1 will be retrieved given the Boolean query below?

((chaucer OR milton) AND (NOT swift)) OR ((NOT chaucer) AND (swift OR shakespeare))

Table 1 Term-document weight in Boolean model

	Chaucer	Milton	Shakespeare	Swift
D1	0	0	0	0
D2	0	0	0	1
D3	0	0	1	0
D4	0	0	1	1
D5	0	1	0	0
D6	0	1	0	1
D7	0	1	1	0
D8	0	1	1	1
D9	1	0	0	0
D10	1	0	0	1
D11	1	0	1	0
D12	1	0	1	1
D13	1	1	0	0
D14	1	1	0	1
D15	1	1	1	0
D16	1	1	1	1

Problem 6 (10 points)

Courtesy Huahai Yang

Given a query Q and a collection of documents A, B and C shown in table 1, rank the similarity of A, B, and C with respect to Q using the cosine similarity metric.

The numbers in the table are the (e.g., tf-idf) weights between each term and document or query; simply compute the cosine similarity using these weights.

Table 2 Term-Document Weights

	Cat	Food	Fancy
Q	3	4	1
A	2	1	0
B	1	3	1
C	0	2	2

jaa@ccs.neu.edu