The Literature -- Books and journals
ACL-MIT Series in Natural Language Processing contains a number of
books, mostly on advanced topics.
The Center for the Study of Language and Information (CSLI) at Stanford publishes an extensive
collection of books on advanced topics, many of them covering linguistics and computational
linguistics (or at least the theory thereof).
Access their catalogue
and look at the
This deals with aspects of word formation such as plurals, hyphenated forms, and various
affixes. These are quite common in biology, so it is useful to understand this field.
Examples of common prefixes and suffixes are: ortho-, poly-, micro-,
-ase, -some, -cin, -ine, -globin, -genic, and many more. Simple "stemming" chops off
suffixes, typically to turn plurals into singulars, but more careful manipulations
can be done.
Sproat, Richard. 1992.
Morphology and Computation. Cambridge:
MIT Press. 1992. ISBN 0-262-19314-0 313 pp. 49 illus.
$50.00/£34.50 (cloth). This is an excellent book on this topic.
The books below all require some competence in mathematics, particularly discrete math
and probability, statistics and some information theory.
Foundations of Statistical Natural Language Processing
by Christopher D. Manning and Hinrich Schütze
Cambridge: MIT Press. 1999. ISBN 0-262-13360-1 620 pp. $64.95/£44.95 (cloth).
for more details about the book, sample chapters, courses
around the country using it, etc.
Elements of Information Theory by Thomas M.
Cover and Joy A. Thomas, published by John Wiley, 1991.
Here is the authors' own
home page for the book. Truly an excellent book on the topics underlying
much of statistical natural language processing.
Finite-State Language Processing
by Emmanuel Roche and Yves Schabes (eds.)
MIT Press 1997 ISBN 0-262-18182-7 464 pp. $60.00/£41.50 (cloth).
Finite-state methods, corresponding roughly to the regular expressions
used in Perl, the Java ORO tools, etc., are not as powerful as some
parsing techniques but they are theoretically and in practice faster
and more efficient than the more general methods. By carefully
crafting them and combining various finite-state analyses, practical
and fast systems can be built.
Statistical Language Learning
- Statistical Language Learning
by Eugene Charniak. MIT Press, 1996. ISBN 0-262-53141-0 192 pp. 80 illus. $17.95/£12.50 (paper).
A short book, full of interesting topics.