CSG 224 Natural Language Processing
		       Project Ideas

For each project, you are required to hand in a report and provide
copies of all software and any data created in doing the project.

Your group must give a presentation of your project to the class
on the last day, April 19.

You are encouraged to hand in your report and other materials on
the same day; however the absolutely final due date is Monday, April 24.

For each of these projects, Prof. Hafner will assist you in creating
a specific plan and in finding additional reference materials to help you, 
beyond our course textbook.

Each project report must include a section titled Background and Related 
Work, giving an academic context to your project, and citing the
appropriate references. (This is not a thorough literature review but
a brief description of the underlying theory, applications (if there are any), 
most important issues and most important accomplishments of the area you are 
working in.) Although the paper is written for a class, it should "stand
alone" and be readable by other CS academics with related backgrounds. 

I. Parsing Project:

Implement a Unification parser or a Probabilistic parser. 

Modify the context free grammar that we used for our chart parser to 
reflect the more powerful formalism. (In the case of the probabilistic 
parser, this will require a training step.)

Compare the performance of your parser on a variety of sentences that
you will devise for this purpose, including but not limited to the ones 
assigned for the chart parser.

Write a report describing what you did, what you discovered, and
evaluating the capabilities and pros and cons of your parser.

II. Text classification project

Given a set of categories assigned to documents in the CCIS Jobs database,
implement a text classification system to "learn" a set of classification
rules, and evaluate your system.

Write a report describing what you did, what you discovered.

III. Word Sense Ambiguity Experiment 

Using the SEMCOR database, decide whether to concentrate on nouns or verbs.
Analyze the ambiguity of the words in that lexical category.

Create a simple collocation-based program for sense assignment,
and evaluate how well it does compared to the "most frequent"
sense assignment.

Write a report describing what you did and what you discovered.

IV. Word Sense Ambiguity Experiment 2

Investigate the WordNet noun or verb network, and propose a system to
reduce the number of word senses. Implement this algorithm and
apply it to a set of at least 20 selected words with many senses.

Write a report describing what you did, what you discovered.

V. Tagging Experiment

Select two different genres of text, and find samples of each on the Web to
analyze.

Apply an existing tagger (pre-trained) to both samples, and compare the
results:
  -- to each other (in distribution of tags, etc.) 
  -- to a subset of each sample that is hand-tagged by your group members

For "extra credit"
Train the tagger on each text sample separately, and see how that affects 
the performance.