BACK TO CONTENTS   |    PDF   |    PREVIOUS   |    NEXT

Title

A text mining approach to detect mentions of protein glycosylation in biomedical text

 

Authors

Daksha Shukla1 & Valadi K Jayaraman2*

 

Affiliation

1Bioinformatics Centre, University of Pune, India; 2Centre for Development of Advanced Computing, University of Pune, India.

 

Email

jayaramanv@cdac.in; *Corresponding author

 

Article Type

Hypothesis

 

Date

Received July 18, 2012; Accepted August 03, 2012; Published August 24, 2012

 

Abstract

Protein Glycosylation is an important post translational event that plays a pivotal role in protein folding and protein is trafficking. We describe a dictionary based and a rule based approach to mine ‘mentions’ of protein glycosylation in text. The dictionary based approach relies on a set of manually curated dictionaries specially constructed to address this task. Abstracts are then screened for the ‘mentions’ of words from these dictionaries which are further scored followed by classification on the basis of a threshold. The rule based approaches also relies on the words in the dictionary to arrive at the features which are used for classification. The performance of the system using both the approaches has been evaluated using a manually curated corpus of 3133 abstracts. The evaluation suggests that the performance of the Rule based approach supersedes that of the Dictionary based approach.

 

Keywords

Text mining, Glycosylation, Rule-based approach, Dictionary -based approach

 

Citation

Shukla & Jayaraman, Bioinformation 8(16): 758-762 (2012)
 

Edited by

P Kangueane

 

ISSN

0973-2063

 

Publisher

Biomedical Informatics

 

License

This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.