Information Retrieval

When doing the information extraction project, I was also taking Professor Javed Aslam's information retrieval class. From the summer research project, I felt that it was very painful to interpret data manually from documents. I found that Aslam was doing some very interesting research on IR systems evaluation where judgments are incomplete.

From Sept. 2009, I have been working on IR evaluation under Professor Javed Aslam.

Buckley, et.al.[1] showed that average precision (AP) is not robust enough for incomplete relevance judgments. They proposed a measure bpref, unfortunately it was without strong theoretical support. Not soon after, Yilmaz, et.al. completed research on how to estimate average when judgments are incomplete, both when sampling is uniformly random[7] as well as when weighted[8]. They proposed several better methods: indAP, subAP, infAP, and xinfAP. All the methods are based on an assumption that AP is the ``gold standard'' for evaluation. On the other hand, nDCG (Discounted Cumulative Gain) reflected other features of IR systems. According to Croft, et.al.[3], the nDCG's formula has no theoretical support at all. We are trying to find some connections between AP and nDCG.

Sampling strategy is very important for evaluation IR systems. Three interesting papers that I am reading are Aslam et.al.'s ``A Practical Sampling Strategy for Efficient Retrieval Evaluation'' (draft), Pavlu's PhD thesis ``Large Scale IR Evaluation''[6] and Carterette et.al.'s ``If I Had a Million Queries''[2].

Wu Jiang 2009-11-05