CS G224 Natural Language Processing
		         Assignment 3

Spring 2006, Prof. Hafner
Due Date:  Feb. 1, 2006

In parsing natural language, we normally separate the grammar rules
(for which parts of speech serve as the terminal symbols) from the
lexicon. This is illustrated in Figures 9.2 and 9.3 in your textbook.
After tokenizing (and possibly stemming), we look up the words in the
lexicon, ending up with a graph of possibilities. 

Your assignment for this week is to design a natural language lexicon 
program and implement it in an object oriented language. You are
highly encouraged to learn a little Python and do this assignment in
Python.  However, if you don't have time, programs in Java, C++ or
C# will be accepted.  Be prepared to discuss your design and your
lexicon contents in class.

Your lexicon will have Word objects as its components, with each word
having one or more Def objects (i.e., definitions) attached to it. A 
definition consists of a lexical category (POS) and zero or more features, 
as shown in the table on page 65 of your text. (the + is not necessary). 
We will use Penn tags for our lexical categories.

The diagram below is intended to be suggestive.

Lexicon  --->   Word	
			  POS
			  Feature
		 	  Feature

			  POS	
			  Feature

		Word
		 . . .

		Word

For example, the word goose shown in the table has three POS values:
NN and VB and VBP, therefore it would have 3 definitions.

The features will be used for finer distinctions: initially nouns
(NN and NNS) should have the feature MASS or COUNT. The auxilliary
verbs (forms of be do and have) should have the feature AUX.  
Verbs should have the feature INTRANS if they can be used without
a direct object, and TRANS if they can be used with a direct object.
(Note that some verbs such as "eat" have both.)  

Two lookup methods should be provided:  with one (string) argument,
all definitions should be returned for the word (a list or tuple). With two
string arguments, the second argument represents a part of speech
tag, and the word's definition for that part of speech should be returned.
For purposes of this assignment, we will make the simplifying assumption
that a word has at most one definition in a given lexical category.
A method should also be exist for printing definitions in a readable form.

Demonstrate your lexicon program by 

1. creating a lexicon containing the words from the first sentence of the
   Harry Potter extract we looked at for Assignment 1.  (not including 
   punctuation). Include the lexical categories that you think COULD be 
   correct for each word in SOME CONTEXT.  (for example, "had" can *never* 
   be a VBP (present tense verb) or a MD (modal verb)), so those codes 
   should not be included.  On the other hand, "shared", which is VBD in 
   the sample text, can also be VBN.  Take your best guess regarding the 
   features mentioned above.

2. Run some test cases, invoking the two lookup methods (and printing the 
   results) for a few selected words (at least one with more than one 
   lexical category), and include some failure cases also.  

Turn in a printout of your lexicon code, the test program, and its
output.

This program will be used later as part of a parser exercise.