== README for relational Congressional Record database ==

[ Daniel Mauer, dmauer@mitre.org ]

The accompanying SQL file contains a relational database consisting of
the contents of the United States Congressional Record from 1996 through
2006 (105th through 108th congresses), paired with CommonSpace ideological
scores (2-dimensional scale, liberal/conservative and social issues) for
elected representatives: see the VoteView website, http://www.voteview.com,
built and maintained by Prof. Keith Poole (UC San Diego) and Prof. Howard
Rosenthal (NYU).

If you intend to use the data for a project, I would appreciate it if you
let me know.  That said, it is free to use for any purposes; if you
redistribute it, though, please include this README file.

It is organized into the following tables:

icpsr: Each elected representative's name and CommonSpace scores, indexed
		by ICPSR ID.  This table generally contains exactly one row per
		person, with the exception of some representatives who switched
		parties.  Such representatives have one ID that covers their time
		in each party (and, therefore, separate CommonSpace scores for
		each time period).  Specifically, this includes the following
		representatives:
			- Rodney Alexander
			- Michael Forbes
			- Virgil Goode
			- Ralph Hall
			- James Jeffords
			- Matthew Martinez
		It would be worthwhile to add a column for "first ICPSR ID" or similar
		to make it clear when an individual has two records here.

speakers: A given individual (as listed in the icpsr table) will have one
		entry in the speaker table for each term in office (or partial term,
		in the case of resignation, party switch, death, etc).  Speakers
		whose Speaker ID is greater than 10000 are speakers referenced only
		by title, such as "President Pro Tempore", and have a name given in
		the "special" field instead of an ICPSR ID.  For senators and the
		president/vice president, "district" is 0.

files: The electronic (plain text) version of the CR is split into files.
		this table contains information about each such file, including
		the date of transcription, chamber and congress number.  The "file"
		column refers to the plaintext files, which are also available.

statements: this contains the full text of each floor statement made in
		the house or senate.  Each statement is indexed by speaker, file
		ID and "linenum".  The line number is not a literal line
		number, but rather an index into each file to be used strictly
		to order statements chronologically.  The "ambiguity" field
		is set to 1 if the speaker's identity could not be ascertained in
		the CR document (e.g., only the speaker's last name was given,
		and more than one representative in the same chamber shared that
		last name).  In cases where there is ambiguity, the Detail field
		is generally used to give what information was available about the
		speaker, who may or may not be an elected representative.

states: I doubt this table requires extended explanation.
