CS6200: Information Retrieval

January 8, 2014

Lecture Notes – Introduction
CMS Chapter 1
extra As We May Think, Vannevar Bush, 1945.
extra The History of Information Retrieval Research, Croft and Sanderson, IEEE Xplore, 2012.

Topics covered: What is Information Retrieval?

January 15, 2014

Lecture Notes – Architecture of a Search Engine [audio]
Lecture Notes – Collecting Data [audio]
CMS Chapters 2, 3
MRS Chapters 19, 20
extra Google – Inside Search has a high-level discussion of their search engine.
extra The Anatomy of a Large-Scale Hypertextual Web Search Engine, Sergey Brin and Larry Page, 1998. Google's big debut.
extra Detecting Spammers on Twitter, Fabrício Benevenuto et al, 2010. Interesting to consider to what extent spammers can learn to avoid detection by reading papers like these.

Topics covered: Search engine architecture, crawling the web, processing and storing documents, detecting duplicates, handling noise.

January 22, 2014

Lecture notes – Processing Text [audio, part 1] [audio, part 2]
Supplemental notes – Named Entity Recognition
Supplemental notes – Page Rank
CMS Chapter 4
MRS Chapters 2, 21

Topics covered: Text statistics, document parsing, tokenizing, stemming, stopping, phrases, entities, internationalization.

January 29, 2014

Lecture notes – Indexing and Ranking [audio, part 1] [audio, part 2]
CMS Chapter 5
MRS Chapters 4, 5
extra Scaling Up All-Pairs Similarity Search, R. J. Bayardo, Yiming Ma, Ramakrishnan Srikant. The paper I mention in lecture showing one way to take advantage of vector sparsity to speed up comparisons between queries and documents. Also useful in many other domains. See also this sample implementation.

Topics covered: Document structure, link extraction, ranking, indexes, query processing, structured queries, optimization, map reduce, distributed evaluation, caching.

~~February 5, 2014~~

Class canceled

February 12, 2014

Lecture Notes – Queries and Interfaces
CMS Chapter 6
MRS Chapter 9
extra A survey on the use of relevance feedback for information access systems, I. Ruthven, M. Lalmas, 2003. Explains several techniques for relevance feedback, and their effectiveness.
extra Query expansion using local and global document analysis, J. Xu, W.B. Croft, 1996. Compares approaches to query expansion based on looking at all indexed documents ("global expansion") versus just looking at documents retrieved for a query ("local expansion").

Topics covered: Information needs; queries; query transformation and refinement; stopping and stemming revisited; spell checking; query expansion; relevance feedback; context and personalization; results pages and snippets; advertising; clustering results; user behavior analysis.

February 19, 2014

Lecture Notes – Retrieval Models, Part 1
CMS Chapter 7
MRS Chapters 11-12, and background in chapters 1 and 6

Topics covered: Overview of retrieval models; Boolean retrieval; vector space models; probabilistic models; classification; the BM25 ranking algorithm; ranking based on language models; query likelihood ranking.

February 26, 2014

Lecture Notes – Retrieval Models, Part 2
CMS Chapter 7
MRS Chapters 11-12, and background in chapters 1 and 6

Topics covered: Relevance models and pseudo-relevance feedback; complex queries and combining evidence; inference networks; the Galago query language; models for web search; machine learning and information retrieval.

~~March 5, 2014~~

Spring break! Have fun!

March 12, 2014

Lecture Notes – Evaluation, Part 1

Topics covered: Test collections; query logs; effectiveness metrics; recall and precision; averaging and interpolation.

March 19, 2014

Lecture Notes – Evaluation, Part 2

Topics covered: Focusing on top documents; training, testing, and statistics; significance tests; setting parameter values.

March 26, 2014

Lecture Notes – Commercial Search Engines
extra Counting Triangles and the Curse of the Last Reducer, Suri and Vassilvitskii, 2011. A MapReduce algorithm for computing the clustering coefficient, a common measure of influence in a social network.
extra Item-based collaborative filtering recommendation algorithms, Sarwar et al, 2001. An overview and comparison of several approaches to collaborative filtering.

Topics covered include: Tailoring IR for web businesses: Facebook, Amazon, LinkedIn, Twitter, etc.; practical aspects of implementing large-scale search engines; data distribution with BigTable/NoSQL; task distribution with MapReduce

April 2, 2014

Lecture notes – Machine Learning in IR
extra A Short Introduction to Learning to Rank, Hang Li, 2011.
extra Learning to Rank for Information Retrieval, Tie-Yan Liu, 2008.

Topics covered include: Probability; Machine Learning; Learning to Rank; features for document ranking

April 9, 2014

Lecture notes – Information Extraction

Topics covered include: Named Entity Recognition, Relationship Classification, Question Answering, Summarization

April 16, 2014

Lecture notes - Open Questions in IR
extra Recommended Reading for IR Research Students,Alistair Moffat, Justin Zobel, David Hawking (eds.), 2004. Key papers in IR, selected by top researchers in the field.
extra Frontiers, Challenges, and Opportunities for Information Retrieval: Report from SWIRL 2012, James Allan, Bruce Croft, Alistair Moffat, and Mark Sanderson (eds.), 2012. Suggested areas for future research in IR.

Topics covered include: What are the major unsolved problems in IR which may be solvable in the near future?

Back to course page