CS6200: Information Retrieval
Fall 2014 Syllabus
Return to basic course information.
This schedule is subject to change. Check back as the class progresses.
CMS refers to Search Engines by Croft, Metzler, and Strohman; MRS refers to Introduction to Information Retrieval by Manning, Raghavan, and Schütze.
- Overview of Information Retrieval (4 Sept. 2014)
- Architecture of a Search Engine (4 Sept. 2014)
- Acquiring Data (4, 11 Sept.)
- Reading: CMS chap. 3; MRS chap. 19 and 20
- Crawling the Web
- Document Conversion
- Storing the Documents
- Detecting Duplicates
- Noise Detection and Removal
- Processing Text (11, 18 Sept.)
- Reading: CMS chap. 4; MRS chap. 2 and 21
- Text Statistics
- Document Parsing
- Ranking with Indexes (25 Sept.)
- Reading: CMS chap. 5; MRS chap. 4-5
- Abstract Model of Ranking
- Inverted indexes
- MapReduce
- Query Processing
- Document-at-a-time evaluation
- Term-at-a-time evaluation
- Optimization techniques
- Structured queries
- Distributed evaluation
- Caching
- Queries and Interfaces (2, 9 Oct.)
- Reading: CMS chap. 6
- Information Needs and Queries
- Query Transformation and Refinement
- Stopping and Stemming Revisited
- Spell Checking and Query Suggestions
- Query Expansion
- Relevance Feedback
- Context and Personalization
- Displaying the Results
- Result Pages and Snippets
- Advertising and Search
- Clustering the Results
- Translation
- User Behavior Analysis
- Retrieval Models (9, 16 Oct.)
- Reading: CMS chap. 7; MRS chap. 11-12 and for background chap. 1 and 6
- Overview of Retrieval Models
- Boolean Retrieval
- The Vector Space Model
- Probabilistic Models
- Information Retrieval as Classification
- The BM25 Ranking Algorithm
- Ranking based on Language Models
- Query Likelihood Ranking
- Relevance Models and Pseudo-Relevance Feedback
- Complex Queries and Combining Evidence
- The Inference Network Model
- The Galago Query Language
- Models for Web search
- Machine Learning and Information Retrieval
- Evaluating Search Engines (23, 30 Oct.)
- Reading: CMS chap. 8; MRS chap. 8
- Test collections
- Query logs
- Effectiveness Metrics
- Recall and Precision
- Averaging and interpolation
- Focusing on the top documents
- Training, Testing, and Statistics
- Significance tests
- Setting parameter values
- Classification and Clustering
(see also further slides
on clustering
and classification) (6 Nov.)
- User Modeling (6 Nov.)
- Social Search: Networks of People and Search Engines (13 Nov.)
- User tagging
- Searching within Communities
- Filtering and recommending
- Metasearch
- Beyond Bag of Words (20 Nov.)