CS G224 Natural Language Processing Assignment 4 (slightly revised after class on 2/1) Spring 2006, Prof. Hafner Due Date: Feb. 8, 2006 This assignment extends assignment 3. Part I. Given your part of speech assignments for the words of the first Harry Potter sentence (with the punctuation removed), how many tag sequences will be considered by a naive algorithm for probabilistic POS tagging? If the Viterbi algorithm is used, what is the maximum number of active paths that will be considered at any step? Part II. Add new capabilities to the lexicon program you wrote as follows: 1. Each word will have a "base" form, which is either "true" or another word in the lexicon. Words which are base forms for other words can have two sets of features: "own" and "inheritable". When a word is retrieved which has a different base forms, the base form's inheritable features should be added to its own features. You can assume there are no chains of base-feature inheritance. Example: The verb "eat" has the inheritable features TRANS and INTRANS, and the words "eats", "eaten", and "ate" inherit them. The verb "have" has the inheritable feature AUX, and the words has and had inherit that feature. Modify the lexicon you built by putting in the base forms for all words in the sentence, and transferring any inheritable features to those base forms. Demonstrate that the correct set of features are returned by the lookup methods.