CSG 224 Natural Language Processing Project Ideas For each project, you are required to hand in a report and provide copies of all software and any data created in doing the project. Your group must give a presentation of your project to the class on the last day, April 19. You are encouraged to hand in your report and other materials on the same day; however the absolutely final due date is Monday, April 24. For each of these projects, Prof. Hafner will assist you in creating a specific plan and in finding additional reference materials to help you, beyond our course textbook. Each project report must include a section titled Background and Related Work, giving an academic context to your project, and citing the appropriate references. (This is not a thorough literature review but a brief description of the underlying theory, applications (if there are any), most important issues and most important accomplishments of the area you are working in.) Although the paper is written for a class, it should "stand alone" and be readable by other CS academics with related backgrounds. I. Parsing Project: Implement a Unification parser or a Probabilistic parser. Modify the context free grammar that we used for our chart parser to reflect the more powerful formalism. (In the case of the probabilistic parser, this will require a training step.) Compare the performance of your parser on a variety of sentences that you will devise for this purpose, including but not limited to the ones assigned for the chart parser. Write a report describing what you did, what you discovered, and evaluating the capabilities and pros and cons of your parser. II. Text classification project Given a set of categories assigned to documents in the CCIS Jobs database, implement a text classification system to "learn" a set of classification rules, and evaluate your system. Write a report describing what you did, what you discovered. III. Word Sense Ambiguity Experiment Using the SEMCOR database, decide whether to concentrate on nouns or verbs. Analyze the ambiguity of the words in that lexical category. Create a simple collocation-based program for sense assignment, and evaluate how well it does compared to the "most frequent" sense assignment. Write a report describing what you did and what you discovered. IV. Word Sense Ambiguity Experiment 2 Investigate the WordNet noun or verb network, and propose a system to reduce the number of word senses. Implement this algorithm and apply it to a set of at least 20 selected words with many senses. Write a report describing what you did, what you discovered. V. Tagging Experiment Select two different genres of text, and find samples of each on the Web to analyze. Apply an existing tagger (pre-trained) to both samples, and compare the results: -- to each other (in distribution of tags, etc.) -- to a subset of each sample that is hand-tagged by your group members For "extra credit" Train the tagger on each text sample separately, and see how that affects the performance.