Dan Mauer's Projects Page

Here are a few interesting projects other than those visible on the Relational Agents Group page that I've worked on (or am currently working on) - this page is under construction.

Annotated Congressional Record and Ideology Detection Engine
- This is a current work in progress, in two parts. The first part is nearly complete: I have taken ten years of the Congressional Record, which is currently available only in human-readable format, and parsed the contents, extracting each statement made by a member of Congress, and building a large SQL database of floor statements associated with the senator or representative who spoke. Each senator and representative is annotated in the database with CommonSpace scores and other data.
- The second part utilizes this data to train a classifier, using a bigram/trigram language model, to recognize terms and phrases which are significantly more likely to be uttered by a liberal than a conservative, and vice versa. The full project proposal can be viewed here.
- The full database, in SQL Dump format, can be downloaded here [.tar.gz, 232MB]. Be sure to read the included README file. Note that the data is somewhat noisy, as most has been automatically parsed from an imperfect source (i.e. there are certainly some typos, may be a few misattributed statements, etc; some statements are missing, as well. I will document the shortcomings of the data sometime or other.)
Synchronous, Probabilistic, Lexicalized Tree Insertion Grammars for Machine Translation
- I worked on this project while a Special Student at Harvard. It began as part of a seminar on natural language processing. I was a part of initial discussions and took part in the design and implementation of an early version of the system, as detailed in this report.
- The work begun in that seminar has since been significantly furthured, primarily by Stuart Shieber, Rebecca Nesson and Alexander Rush. These researchers recently published a long paper on this work at AMTA 2006.