import ebm_nlp_demo as e

Task 1: Loading annotations¶

The annotations are subdivied in based on several categories. Lets start by loading annotations from the second phase (detailed hierarchical labels) for the RCTs' interventions.

The first annotations we consider here are the individual expert-provided gold standard labels.

worker_map, doc_map = e.read_anns('hierarchical_labels', 'interventions', \
                                  ann_type = 'individual', model_phase = 'test/gold')

Loaded annotations for 200 documents from 3 workers

for pmid, doc in doc_map.items()[:3]:
    print 'PMID: %s' %pmid
    print doc.text
    e.print_labeled_spans(doc)
    print

PMID: 15119720
Bitewing film quality: a clinical comparison of the loop vs. holder techniques.

OBJECTIVE To compare in vivo bitewing film quality using the holder versus the paper loop technique.
METHOD AND MATERIALS Four bitewing films were taken from the right and left premolar and molar regions of 45 dental students using both the bitewing holder and paper loop techniques. A total of 360 films were taken and assessed by an experienced practitioner not apprised of the bitewing technique used. Of interest were: (1) the number of overlaps and the percentage of teeth showing the alveolar crest; (2) proper film positioning; and (3) the percentage of cone cutting. A Poisson regression using generalized estimating equations (GEEs) was used to estimate the difference in overlap between the two techniques. For proper positioning and cone cutting, logistic regressions using GEEs were used.
RESULTS The average number of horizontal overlaps for the loop and holder techniques at the right premolar, right molar, left premolar, and left molar were 1.64, 2.11, 2.16, 2.78, and 1.64, 2.00, 2.00, 2.18, respectively. The loop technique was 1.11 times more likely to cause overlapping than the holder technique. The highest percentage of teeth showing the alveolar crest by the loop technique was 97.8% in the mandibular second premolar and first molar. With respect to film positioning, the loop technique was 1.12 times more likely to cause improper positioning than the holder technique. Both techniques demonstrated minimal cone cutting (1 in the loop versus 0 in the holder).
CONCLUSION The quality of bitewing films taken by the loop and holder techniques was not significantly different.


Label spans for wid = 00001
[Surgical]: loop 
[Surgical]: holder techniques 
[Surgical]: holder versus the paper loop technique 
[Surgical]: holder and paper loop techniques 
[Surgical]: bitewing technique 
[Surgical]: loop 
[Surgical]: holder techniques 
[Surgical]: loop technique 
[Surgical]: holder technique . 
[Surgical]: loop technique 
[Surgical]: loop technique 
[Surgical]: holder technique . 
[Surgical]: loop and holder techniques 

Label spans for wid = 00003
[Other]: loop vs. holder techniques . 
[Other]: holder versus the paper loop technique . 
[Other]: bitewing holder and paper loop techniques . 
[Other]: bitewing technique 
[Other]: loop 
[Other]: holder techniques 
[Other]: loop technique 
[Other]: holder technique . 
[Other]: loop technique 
[Other]: loop technique 
[Other]: holder technique . 
[Other]: loop and holder techniques 


PMID: 20406576
The ScanBrit randomised, controlled, single-blind study of a gluten- and casein-free dietary intervention for children with autism spectrum disorders.

There is increasing interest in the use of gluten- and casein-free diets for children with autism spectrum disorders (ASDs). We report results from a two-stage, 24-month, randomised, controlled trial incorporating an adaptive 'catch-up' design and interim analysis. Stage 1 of the trial saw 72 Danish children (aged 4 years to 10 years 11 months) assigned to diet (A) or non-diet (B) groups by stratified randomisation. Autism Diagnostic Observation Schedule (ADOS) and the Gilliam Autism Rating Scale (GARS) were used to assess core autism behaviours, Vineland Adaptive Behaviour Scales (VABS) to ascertain developmental level, and Attention-Deficit Hyperactivity Disorder - IV scale (ADHD-IV) to determine inattention and hyperactivity. Participants were tested at baseline, 8, and 12 months. Based on per protocol repeated measures analysis, data for 26 diet children and 29 controls were available at 12 months. At this point, there was a significant improvement to mean diet group scores (time*treatment interaction) on sub-domains of ADOS, GARS and ADHD-IV measures. Surpassing of predefined statistical thresholds as evidence of improvement in group A at 12 months sanctioned the re-assignment of group B participants to active dietary treatment. Stage 2 data for 18 group A and 17 group B participants were available at 24 months. Multiple scenario analysis based on inter- and intra-group comparisons showed some evidence of sustained clinical group improvements although possibly indicative of a plateau effect for intervention. Our results suggest that dietary intervention may positively affect developmental outcome for some children diagnosed with ASD. In the absence of a placebo condition to the current investigation, we are, however, unable to disqualify potential effects derived from intervention outside of dietary changes. Further studies are required to ascertain potential best- and non-responders to intervention. The study was registered with ClincialTrials.gov, number NCT00614198.


Label spans for wid = 00001
[Other]: gluten- and casein-free dietary intervention 
[Other]: gluten- and casein-free diets 
[Other]: diet 
[Other]: non-diet 
[Other]: dietary intervention 
[Control]: placebo 

Label spans for wid = 00003
[Other]: gluten- and casein-free dietary intervention 
[Other]: gluten- and casein-free diets 
[Control]: non-diet 
[Other]: dietary intervention 
[Control]: placebo 

Label spans for wid = 00002
[Pharmacological]: gluten- and casein-free dietary intervention 
[Pharmacological]: gluten- and casein-free diets 
[Pharmacological]: diet 
[Control]: non-diet 
[Pharmacological]: dietary intervention 


PMID: 21902704
Effects of three oral analgesics on postoperative pain following root canal preparation: a controlled clinical trial.

AIM To compare the effects of single doses of three oral medications on postoperative pain following instrumentation of root canals in teeth with irreversible pulpitis.
METHODOLOGY In this double-blind clinical trial, 100 patients who had anterior or premolar teeth with irreversible pulpitis without any signs and symptoms of acute or chronic apical periodontitis and moderate to severe pain were divided by balanced block random allocation into four groups of 25 each, a control group receiving a placebo medication, and three experimental groups receiving a single dose of either Tramadol (100?mg), Novafen (325?mg of paracetamol, 200?mg ibuprofen and 40?mg caffeine anhydrous) or Naproxen (500?mg) immediately after the first appointment where the pulp was removed, and the canals were fully prepared. The intensity of pain was scored based on 10-point VAS before and after treatment for up to 24?h postoperatively. Data were submitted to repeated analysis of variance.
RESULTS At the 6, 12 and 24?h postoperative intervals after drug administration, the intensity of pain was significantly lower in the experimental groups than in the placebo group (P?<?0.01). Tramadol was significantly less effective (P?<?0.05) than Naproxen, and Novafen that were similar to each other (P?>?0.05).
CONCLUSION A single oral dose of Naproxen, Novafen and Tramadol taken immediately after treatment reduced postoperative pain following pulpectomy and root canal preparation of teeth with irreversible pulpitis.


Label spans for wid = 00001
[Pharmacological]: analgesics 
[Control]: control group 
[Control]: placebo 
[Pharmacological]: Tramadol 
[Pharmacological]: Novafen 
[Pharmacological]: paracetamol 
[Pharmacological]: ibuprofen 
[Pharmacological]: caffeine anhydrous 
[Pharmacological]: Naproxen 
[Pharmacological]: placebo 
[Pharmacological]: Tramadol 
[Pharmacological]: Naproxen 
[Pharmacological]: Novafen 
[Pharmacological]: Naproxen 
[Pharmacological]: Novafen 
[Pharmacological]: Tramadol 

Label spans for wid = 00003
[Pharmacological]: analgesics 
[Control]: placebo 
[Pharmacological]: Tramadol 
[Pharmacological]: Novafen ( 325 mg of paracetamol , 200 mg ibuprofen and 40 mg caffeine anhydrous ) or Naproxen 
[Control]: placebo 
[Pharmacological]: Naproxen , and Novafen 
[Pharmacological]: Naproxen , Novafen and Tramadol

Task 2: Measuring agreement¶

Now that we have some annotations from the three expert annotators, we can evaluate how well they agree across all 200 documents.

kappas = e.compute_worker_kappas(worker_map, doc_map)

Pairwise Cohen's Kappa:
       00001 00002 00003
00001:        0.63  0.62
00002:  0.63        0.52
00003:  0.62  0.52

In addition to how well the exerts agree with one another, we may also wonder if they match what the crowd-sourced annotations look like. Lets load the aggregated labels for the Mechanical Turk labels and compare them.

worker_map_agg, doc_map_agg = e.read_anns('hierarchical_labels', 'interventions', \
                                          ann_type = 'aggregated', model_phase = 'test/crowd')

Loaded annotations for 200 documents from 1 worker

We can add these new annotations to our existing document and worker maps under the new worker ID ('AGGREGATED')

# Iterate through the new document objects and copy over each annotation
for pmid, doc_agg in doc_map_agg.items():
    for wid, labels in doc_agg.anns.items():
        doc_map[pmid].anns[wid] = labels

# Note: there is only one item pair in the new worker map
for wid, worker_agg in worker_map_agg.items():
    worker_map[wid] = worker_agg
    
kappas = e.compute_worker_kappas(worker_map, doc_map)

Pairwise Cohen's Kappa:
                 00001      00002      00003 AGGREGATED
     00001:                  0.63       0.62       0.62
     00002:       0.63                  0.52       0.59
     00003:       0.62       0.52                  0.71
AGGREGATED:       0.62       0.59       0.71