Mining the Bibliome: NLP to Extract
Facts from the Biomedical Literature

The Computational Biology Initiative at the Center for Biomedical Informatics (CBI at HMS) is developing advance natural language processing techniques to extract facts from biomedical literature. One goal to build such large graphs as drug and protein interaction, disease pathways, etc. Another goal is to combine these extracted predicates with other data, such as co-expression. Unlike many commercial and academic projects we focus on creating high-confidence outcome for a few selected disorders such as autism and Asperger syndrome.

We are starting an effort of collaborative annotation, in conjunction with researchers from HMS and the Countway Library, of a new corpus of selected texts from MEDLINE focused on Autism. We aim at creating a “flat” annotation of the bio-medical texts with tags for “genes”, “diseases” and their relationships. Below we give some examples of snippets with color-coded objects:

Based on these findings, we hypothesize that rare mutations occur in the WNT2 gene that significantly increase susceptibility to autism even when present in single copies, while a more common WNT2 allele (or alleles) not yet identified may exist that contributes to the disorder to a lesser degree.

We report a positive association between autism and two HRAS markers.

These findings support a role for genetic variants within the GABA receptor gene complex in 15q11-13 in autistic disorder.

Publications:

Automated identification of diagnosis and co-morbidity in clinical records. Cano C, Blanco A, Peshkin L. Methods Inf Med. 2009;48(6):546-51. Epub 2009 Aug 20. PMID:19696949

Collaborative text-annotation resource for disease-centered relation extraction from biomedical text. Cano C, Monaghan T, Blanco A, Wall DP, Peshkin L. J Biomed Inform. 2009 Oct;42(5):967-77. Epub 2009 Feb 14. PMID:19232400

Mining the Bibliome: NLP to Extract Facts from the Biomedical Literature

Mining the Bibliome: NLP to Extract
Facts from the Biomedical Literature