CS G224 - Natural Language Processing
		     Spring 2006 - Prof. Hafner
		     Assignment 6 - Due March 3

You are to implement a chart parser that makes use of the the lexicon
you implemented for Assignment 5. (Therefore, those who did not create
a "lookup" function will need to do so.) 

The top level of your parser should be a read/parse/print-trees loop that
allows the user to enter sentences and see each of the maximal parses,
as described below. You can require that the input words be put into a 
list if that makes your programming job easier.  You need not handle 
punctuation.

Below you will find a context free grammar G that should work for most of 
the Harry Potter sentences given in Assignment 5. Conjunctions could not
be included because they give rise to left recursion.

You will also find advice on programming the chart parser, which,
combined with the course readings should make this assignment reasonably
straightforward.

To demonstrate your parser, run the following selected sentences, plus
two failure inputs:
  -- a sentence containing a word not in the dictionary
  -- a sentence whose words are known but which does not have a parse

Snakes bite
He looked at the grandfather clock
They took their usual places in Lockhart's classroom
Harry noticed that they were straining to escape the straps holding them
   inside the box

The "maximal parses" are all the maximal length edges starting at
Node 0, regardless of their grammatical category.
-----------------------------------------------------------------
Grammar G:
Terminal Symbols: 
Noun, ProperNoun, PersonalPronoun, PossessivePronoun, RelPro, Determiner, 
Adverb, Adjective, Number, Aux, Modal, Verb, Preposition, Conjunction, 's

Rules: (non-terminals can be inferred)

Note: Parentheses () mean a choice
      Angle brackets <> mean an optional constituent
      Both of these must be expanded into multiple rules

S --> NP VP

NP --> PersonalPronoun
NP --> ProperNoun
NP --> <DTP> <AP> Nominal    // expands into 4 rules

DTP --> Determiner
DTP --> PossessivePronoun
DTP --> ProperNoun 's
DTP --> Determiner <AP> Nominal 's  /// A general rule gives rise to
				    /// left recursion, so can't
				    /// be used in top down parsing
AP --> Adjective
AP --> Adverb Adjective
AP --> Number AP

Nominal --> Noun
Nominal --> Noun Nominal
Nominal --> Noun <Nominal> PP
Nominal --> Noun <Nominal> VP
Nominal --> Noun <Nominal> RelClause
RelClause --> RelPro VP             // Relative Pronouns: who, that

PP = Preposition NP

VP --> VBG <NP> <PP>   // VBG means "verb group"
VP --> VBG TO VP       // TO = the word to
VP --> VBG THAT S      // THAT = the word that

VBG --> Verb <Adverb>
VBG --> Modal <Adverb> Verb <Adverb>
VBG --> Aux <Adverb> Verb <Adverb>

/// Note that rules like NP --> NP Conjunction NP are not allowed
/// in top down parsing

------------------------------------------------------------------
Programming a chart parser for top-down parsing

I. Data structures:

Grammar --  a set of Rule objects.  A lookup-LHS function that returns a 
subset of the rules with the given LHS.

Chart --  a sequence of nodes.  Each node consists of two sets of edges: 
originating and terminating (at this node).

Edge --  a structure consisting of: originating node, terminating node, label.  
The label consists of a category symbol (usually called CAT), and some 
other elements.

For incomplete edges, the rest of the label consists of a sequence of 
remaining unparsed RHS symbols, followed by a sequence of already-
parsed RHS symbols.  An already-parsed RHS symbol combines its 
CAT with a pointer to the complete edge that satisfies the CAT.  (in 
other words, each already-parsed symbol is the head of a sub-tree).

For complete edges, the sequence of remaining RHS symbols is 
empty.

For lexical edges (which are a variety of complete edges), the rest of 
the label is just the original word that appeared in the input (usually a 
string).

II. Applying the fundamental rule:
When an incomplete edge E is added to the chart terminating at node i:
    for each complete edge originating at node i:
       	if its CAT matches the first remaining RHS symbol of E, then create 
	a 1-node extension of E and put it on the unprocessed stack
    end for
 When a complete edge E is added to the chart originating at node j:
    for each incomplete edge terminating at node j:
	if its first remaining RHS symbol matches CAT(E),  then create a 1- 
	node extension of the incomplete edge and put it on the unprocessed  
	stack.
    end for

III. Overall algorithm -- Top down, depth first, left to right parsing:

1. For each word in the input:
   Lookup the word's definitions, and for each definition, create a
   lexical edge and add it to the chart. (The fundamental rule will not 
   apply since there are no incomplete edges yet.)

2. lookup all grammar rules with LHS = S, create a "self-loop" edge for each 
one, originating and terminating at node 0, and put them on the unprocessed 
stack.

3. While the unprocessed stack is not empty: 

Remove the first edge E from the unprocessed stack.  If E is a 
complete edge, add it to the chart.

If E is an incomplete edge, before adding it to the chart check to see if 
there is an edge (complete or incomplete) originating at E's 
terminating node with a CAT that matches the first unparsed  RHS 
symbol of E. If not, create a self-loop edge for each grammar rule 
whose LHS matches the first unparsed RHS symbol of E, and put 
them on the unprocessed stack.

	 
4. Find the maximal length edges that originate at node 0 and display them 
in bracketed tree format.