Return to BIONLP.ORG home page

Software tools

Updated May 23 2003

A major source of software tools is the The Natural Language Software Registry (NLSR) They are not so much a direct source as a collection of links to providers of NLP software tools of all sorts.

Dan Melamed at NYU has quite a large collection of text-processing tools, mostly written in Perl 5. October 4, 2002: Links below have been updated to reflect the new locations of his tools.

Ted Pedersen at Minnesota has developed the N-gram Statistics Package (NSP). It allows you to identify word n-grams that appear in large corpora using standard tests of association such as Fisher's exact test, the log likelihood ratio, Pearson's chi-squared text, and the Dice Coefficient. NSP has been designed to allow a user to add their own tests with minimal effort.


Most people think of Perl for patterns and Java for browsers. But Java is a platform, with over 80 libraries that include database connectivity, networking, graphics, image processing, GUI building (Swing), and much more. Java's primary uses these days are for server side, middleware software. (My lab, the BKL, uses Java for essentially all its work.)

Regular Expressions in Java2 V1.4:

Much of the power of regular expressions available in Perl is now built in to Java 1.4. A good overview of regular expressions, including the facilities in Java, can be found in the latest edition of Jeffrey E. F. Friedl's book, Mastering Regular Expressions (2nd edition, July 2002). Here's a discussion by Friedl of these issues, with pointers to his book.

Sun's own documentation of the java.util.regexp package refers specifically to the book above.