Knowledge issues -- Ontologies, semantics, AI
Grammars and parsing
Techniques -- Pattern matching, parsing, statistical approaches
Machine-readable dictionaries, lexicons
Contact Bob Futrelle using email.
Natural language processing of biology text
Founding editor: Bob Futrelle (2001)
News, 7/5/2010: Here is the updated link to the collection BioNLP Resources created by Alex Morgan. Though it is six years old, it can be a useful starting point.
News, 10/31/06: The Association for Computational Linguistics has created a Wiki (link) so I've created a prominent link to it in the column to the left.
News, 7/20/05: Additional search options have been added to the search page, namely: Google Scholar, Citeseer, and BLIMP. A search on BLIMP for "2005" is impressive, returning 67 hits, as of today.
News, 6/20/05: You might be interested in a paper we prepared for use in our own research. It is an Open Access Biology paper in which we've numbered every sentence and every token within each sentence. This makes it possible for people working at a distance to discuss various items and constructions of interest to parsing, text mining, etc. Access it here.
News, 6/20/05: Google Scholar has grown to the point that it is a useful tool for finding BioNLP-related papers. A mailing list note about Google Scholar is here.
News, 4/13/05: The volume of material in our field continues to increase. So rather than trying to cache a large number of papers on the BioNLP site, the information on new papers, conferences, etc., is being distributed primarily through the BIONLP mailing list and is available in the publicly readable archives. Since you can search the archives using Google, that makes the information reasonably available.
News, 10/7/04: Two papers devoted to biology text analysis and mining have just been published in the IBM Systems Journal. Access them at the top of our Articles page
News, 9/18/04: Abstracts for six papers on Biomed text mining from PAKDD 2004 are available. Follow the Articles link on the left or go directly here.
I've added Google phrase search of PubMed to the search page.
It adds some capability missing in PubMed itself.
There is also a link there to some notes about it.
Follow the "Search" link at the top left of this page.
News, 7/16/04: Alexander Morgan has produced a quite useful page of information about and links to a variety of freely available BioNLP resources. It is located at: http://www.tufts.edu/~amorga02/bcresources.html. It is divided into the following sections:
News, 3/19/04: CALL FOR PAPERS: IEEE Transactions on Knowledge and Data Engineering (TKDE)
Special Issue on Mining Biological Data, including: Literature Extraction, Text Mining, and Ontologies. Submission due date 15 July 2004. Details in this PDF extracted from the latest issue of TKDE. This news item was also sent to the BioNLP mailing list, as most of these items are. So joining the mailing list will get such information to you in a timely way.
News, 2/2/04: A lengthy new review on biomedical text mining by Shatkay and Feldman was just published. See the link on the Articles page.
News, 11/30/03: New OUP book on computational linguistics: info, table of contents.
News, 11/17/03: The much-heralded new Open Access journal, PLOS Biology, has a new issue out, Vol 1, No 2, which has feature article "Tough Mining -- The challenges of searching the scientific literature" about NLP for biology. It's a news feature rather than a technical article, but it's interesting. Access it here.
News, 11/6/03: I have created a search facility using standard Google hacks, that allows you to search the Bionlp.org site or the mail archives or the huge ACL Anthology (at http://acl.ldc.upenn.edu/). Use the Search link at the top left or right here. (You'll notice that a number of the search results start with "Return to BIONLP.ORG home page" -- that's something I need to fix.)
News, 8/19/03: The sixteen papers from the 2003 ACL Workshop on Natural Language Processing in Biomedicine are available online. You can retrieve them at: http://acl.ldc.upenn.edu/acl2003/nlbio/index.htm The papers are in pdf and ps format and include a Bibtex entry. A quick list of paper titles and authors is available in a BioNLP mailing list archive posting here.
BioMed Central research article corpus available for data mining.
BioMed Central has published more than 2400 peer reviewed research articles,
all of which are covered by BioMed Central's open access license policy:
Unlike a traditional journal's license agreement,
BioMed Central's license allows completely free reuse and
redistribution of the content by anyone.
Note that these are full-text articles, not abstracts.
Further details are available here.
The deadline for the SIGIR'03 Workshop on Text Analysis for
Bioinformatics has been extended to June 27, 2003. We seek short
papers on preliminary and recent work. Authors will retain copyright
ownership and are free to submit their papers for publication
elsewhere after the workshop. The workshop will be held on August 1,
2003 in Toronto.
See http://www.sigir2003.org/biotextCFP.pdf for more information.
News, 6/4/03: Five papers on NLP from the ISMB 2003 meeting, June 29 - July 3 are now posted on the BioNLP site in PDF format via this page.
CALL FOR PAPERS - Submit abstracts by May 15, 2003.
BioLINK has announced the meeting of the
Special Interest Group in Text Mining at this year's ISMB:
Mailing list and website
From their website,
News, 4/8/03: The SIGIR'03 Workshop on Text Analysis for Bioinformatics will be held August 1st, in Toronto, Canada. Paper submission by June 16th. Click for details in the BioNLP archive.
News, 3/13/03: The IEEE Computer Society Bioinformatics is looking for papers on NLP in Biology. Paper deadline is coming up soon, April 1, but there is a May 22 deadline for Poster Abstracts that will be published in the Proceedings. The conference will be held at Stanford, August 11-14, 2003. More information here: http://conferences.computer.org/bioinformatics/. And here is a two-page PDF version of the call for papers.
News, 12/24/2002: TREC2003, the Text Retrieval Conference, has a Genomics track. Click for details in the BioNLP archive
News, 12/23/2002: A very useful new review paper on biology text data mining has just been published by Hirschman, et al. The citation, abstract and references are available here.
Motivation for this site
The literature of the field of biology is the largest of all the sciences. The volume of biology literature each year, measured in bytes, is about fifty times the size of the entire human genome, junk and all. But locked in this literature is an enormous amount of information that can tell us much about the structure and function of genes, proteins, cells and organisms -- how they work as well as how they can fail.
The newly emergent interest in natural language processing for biology has been christened "Information Extraction". But work in this area has been going on for many decades under different names and this site includes a good deal of information about past and current work in NLP and in information extraction for biology in particular. The other major descriptor of the general field is "Computational Linguistics".
The goals for this site include providing material and links in the following areas:
Activities in this community could include:
The site was created by Bob Futrelle, February 27, 2001.
Earlier News (as of 11/28/2002)
News, 12/20/2002: Computational linguist, Daniel Jurafsky, received a MacArthur "Genius Award" in 2002. Though Dan focuses primarily on speech, it's nice to know that one of our own has been so highly honored. His book with Martin is listed on our Books and Journals page.
A challenge -- BioNLP is not easy (by RPF 11/02)
News, 11/28/2002: PSB 2003 Linking Biomedical Language, Information and Knowledge, January 3-7, 2003. Papers now online.
More news, 11/28/2002: ACL 2002 Workshop on Natural Language Processing in the Biomedical Domain. Papers now online.
11/28/2002: There will be a special session at PSB 2003,
"Linking Biomedical Language, Information and Knowledge".
The session is part of the Pacific Symposium on Biocomputing 2003
January 3-7, 2003
Kauai Marriott Resort and Beach Club.
There was a Workshop on Natural Language Processing in the Biomedical Domain at ACL 2002 in Philadelphia. I have placed a mirror of the web pages for the workshop here which includes online copies of the twelve papers, in PDF and Postscript formats. Be warned that some of the links there are not operational, since I have not copied the entire ACL CD contents to the bionlp.org site(!).
11/28/2002: There was a text mining workshop at ISMB 2002 in Edmonton, Alberta, Canada on August 2nd, 2002. Here is the initial announcement. When the workshop has its own page, or I can otherwise get copies of or links to the papers, there'll be a link here.
Archives of even earlier News - Archives.
CONTRIBUTIONS: Send me your papers and reports or links to them. This site will improve primarily by the collection of contributions from researchers and practitioners from around the world. I would be happy to add links to any on-line papers and reports you have or are aware of or cache them on this site for easy access. Any links to other resources would also be most welcome.