CS6120: Natural Language Processing

Spring 2016

Instructor: David Smith, Assistant Professor in Computer and Information Science (Office Hours: Wednesdays, 3-5; WVH 356)

TA: Maryam Aziz (Office Hours: TBA; WVH 472)

Class meeting: Wednesdays, 6-9 p.m., Hayden 424

Course Texts

This is a graduate/undergraduate course that introduces you to natural language processing; it is also an introduction to reading papers in natural language processing. In addition to reading and discussing papers from the NLP literature, you will, in the latter part of the course, write a review of the literature and open problems in an area of NLP. Fortunately, the NLP community has a robust tradition of open-access publication, primarily via the Association for Computational Linguistics' ACL Anthology.

Along with these readings, lectures will provide background in the fundamental linguistics concepts, statistical models, and algorithms used in NLP. These lectures will primarily draw on material from two textbooks which, while not required, provide more useful information:

Speech and Language Processing. Daniel Jurafsky and James H. Martin.
Linguistic Structure Predition. Noah A. Smith.

Syllabus

Lecture notes and readings will be posted on the syllabus.

Assignments

Homework 1: assigned 9 Feb.; due 23 Feb., 11:59pm
Homework 2: assigned 29 Feb.; due ~~15 Mar.~~ 18 Mar., 11:59pm
Literature Review: one-paragraph pitch and two example papers on 16 March
Homework 3: assigned 12 Apr.; due 27 Apr., 11:59pm
Literature Review: Talk in class about highlights of your topic 20 Apr.
Literature Review: Final paper due 28 Apr., 11:59pm

Course Policies

Discussion and Participation

You will read, on average, one paper a week. The goal is not necessarily to figure out every detail of that one paper but rather to understand how each paper fits with what you've learned about NLP as a whole and what future questions it suggests. In other words, the process should mimic what you would do when conducting research in NLP or other areas of applied CS. Near the end of class, you will also give a short presentation on your literature review (see below). This presentation will also count towards the participation score, which totals 20% of the course grade.

Homework Assignments

There will be four homework assignments for 30% of the total course grade. Assignments will mix written derivations and explanations with some programming problems. If you discuss a problem with others, you must note with whom you discussed the problem at the beginning of your solution write-up. Even if you acknowledge collaboration, that does not permit sharing text of the actual write-up. Similar text-reuse from published or online sources is also not permitted.

Literature Review

In the latter part of the course, you will write a review of the literature in an area of NLP, which will constitute 50% of the course grade. First, you will consult with the instructor about an appropriate scope for the review. For instance, “parsing” or “machine translation” or “semantics” are far too broad. Then, you will hand in a first draft that is essentially an annotated bibliography of 5 or 10 (for grads) papers, with roughly a paragraph about how each paper relates to the general topic. Finally, you will hand in the full review and give a short presentation on it (see above). The review should not be structured like an annotated bibliography, where each paper is simply discussed in turn; rather, you should structure the discussion around larger themes. We will also ask you to replicate the results of one paper. This will likely be a paper that you'd like to build on for your later research, though the clarity of the description and availability of data will likely also affect your choice. (For undergraduates, the standard is a looser one of implementing an existing system.)