• CMS refers to Search Engines by Croft, Metzler, and Strohman.
  • MRS refers to Introduction to Information Retrieval by Manning, Raghavan, and Schütze.
  • extra refers to supplemental readings offered for the curious.
January 8, 2014

Topics covered: What is Information Retrieval?

January 15, 2014

Topics covered: Search engine architecture, crawling the web, processing and storing documents, detecting duplicates, handling noise.

January 22, 2014

Topics covered: Text statistics, document parsing, tokenizing, stemming, stopping, phrases, entities, internationalization.

January 29, 2014

Topics covered: Document structure, link extraction, ranking, indexes, query processing, structured queries, optimization, map reduce, distributed evaluation, caching.

February 5, 2014
  • Class canceled
February 12, 2014

Topics covered: Information needs; queries; query transformation and refinement; stopping and stemming revisited; spell checking; query expansion; relevance feedback; context and personalization; results pages and snippets; advertising; clustering results; user behavior analysis.

February 19, 2014

Topics covered: Overview of retrieval models; Boolean retrieval; vector space models; probabilistic models; classification; the BM25 ranking algorithm; ranking based on language models; query likelihood ranking.

February 26, 2014

Topics covered: Relevance models and pseudo-relevance feedback; complex queries and combining evidence; inference networks; the Galago query language; models for web search; machine learning and information retrieval.

March 5, 2014
  • Spring break! Have fun!
March 12, 2014

Topics covered: Test collections; query logs; effectiveness metrics; recall and precision; averaging and interpolation.

March 19, 2014

Topics covered: Focusing on top documents; training, testing, and statistics; significance tests; setting parameter values.

March 26, 2014

Topics covered include: Tailoring IR for web businesses: Facebook, Amazon, LinkedIn, Twitter, etc.; practical aspects of implementing large-scale search engines; data distribution with BigTable/NoSQL; task distribution with MapReduce

April 2, 2014

Topics covered include: Probability; Machine Learning; Learning to Rank; features for document ranking

April 9, 2014

Topics covered include: Named Entity Recognition, Relationship Classification, Question Answering, Summarization

April 16, 2014

Topics covered include: What are the major unsolved problems in IR which may be solvable in the near future?