Return to basic course information.
Assigned: Friday, 22 February 2013
Due: 11:59pm, Wednesday, 13 March 2013
Generating sentences from a grammar [10 points]
First, download the initial
grammar1. The file
contains self explanatory comments, which are delimited by
#. The basic format is:
1 VP Verb NPIgnore the leading "1" for now. The first symbol is the left-hand side of a context-free rewrite rule; the remaining one or more symbols are right-hand-side terminals and non-terminals. There is no typographic distinction enforced between terminals and non-terminals; rather, any symbol that does not appear on a left-hand side is by definition a terminal.
Now let's get down to work:
./generate grammar1 5In other words, print five random sentences from the grammar
grammar1to standard output. You should see output like:
the president ate every sandwich !
Your program should start with the
symbol and recursively choose rewrite rules until
grammar1, there are three
possible expansions of
grammar1; you should
have an equal chance of choosing each of them. Don't
hardwire this grammar into your program. Read it anew each
time it runs, so that you can modify the grammar
Save and hand in 10 random sentences generated by this first version of the program.
grammar2and modify it. Instead of just "1", allow any non-negative real number at the beginning of a rule. For instance, you can assert that the is a more common determiner than a or every like so:
4 Det the 1.5 Det a 0.5 Det everyIn particular, they would be distributed (2/3, 1/4, 1/12).
Play around with the weights in
and see how your generated sentences change. Can you
make the average sentence length much longer or
Hand in the new probabilized generation program, your
grammar2, and ten random
Describing English with context-free rules [15 points]
grammar3. Modify the latter so that, in addition to the strings of
grammar2, it can also generate phenomena like:
Alex kissed a sandwich . that Alex kissed a sandwich perplexed the president . the president thought that Alex kissed a sandwich . the president smiled . Alex ate a sandwich and wanted a pickle . the president and the chief of staff understood that Alex smiled .In addition to new terminals, like Alex, you'll need to add new non-terminals. Of course, you don't want to generate just the six sentences above, but others with the same constructions.
Hand in your expanded grammar.
grammar4. Modify it to model some new constructions of English that you find interesting. Explain in comments to your grammar file what you are doing.
Context-free parsing [25 points]
Using the programming language of your choice, write a recognizer using Earley's algorithm as discussed in class. The parser should read grammars in the format you've been writing above and read input sentences in the space-separated format you used above to output generated sentences. In other words, you should be able to parse the sentences generated by your own grammar. To get sufficient speed, you may want to implement the left-corner indexing trick discussed in class. (Other tricks may not offer as much benefit for a given implementation time within the scope of this assignment.)
For sentences in the language defined by the grammar, output true; otherwise, false.
Since this is an unweighted parser, you don't have to do the bookkeeping of weights or decide among competing alternatives when attaching completed rules.
Extra credit: Competitve grammar writing
This exercise is based on Jason Eisner and Noah Smith's article
of that name. Using the grammar you've handed in
grammar4, we will generate 10 new sentences and
judge whether they are in fact grammatical English. (You can't
win this competition by generating Chinese or Finnish or
Navajo.) Then, your grammatical sentences will be sent to be
parsed by everyone else's Earley recognizers and their
grammars. You get five points for each grammatical
sentence you generate that no one else's grammar can parse.