Welcome to the EBM-NLP corpus

We present a corpus of 5,000 richly annotated abstracts of medical articles describing clinical randomized controlled trials. Annotations include demarcations of textspans that describe the Patient populationenrolled, the Interventions studied and to what they were Compared, and the Outcomes measured (the 'PICO' elements). These spans are further annotated at amore granular level, e.g., individual interventions within them are marked and mapped onto a structured medical vocabulary.

The complete details are described in our ACL 2018 publication.


The corpus creation project is lead by Ani Nenkova at UPenn and Byron Wallace at Northeastern University. Other members of the project include Ben Nye, Jessy Li, Roma Patel, Yinfei Yang, and Iain Marshall.


This work was supported in part by the National Cancer Institute (NCI) of the National Institutes of Health (NIH), award number UH2CA203711.