BACK TO CONTENTS   |    PDF   |    PREVIOUS   |    NEXT

Title

Clustering of PubMed abstracts using nearer terms of the domain

 

Authors

Mary Rajathei David & Selvaraj Samuel*

 

Affiliation

Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli-620024, India

 

Email

selvarajsamuel@gmail.com; *Corresponding author

 

Article Type

Hypothesis

 

Date

Received December 26, 2011; Accepted December 28, 2011; Published January 06, 2012

 

Abstract

Literature search is a process in which external developers provide alternative representations for efficient data mining of biomedical literature such as ranking search results, displaying summarized knowledge of semantics and clustering results into topics. In clustering search results, prominent vocabularies, such as GO (Gene Ontology), MeSH(Medical Subject Headings) and frequent terms extracted from retrieved PubMed abstracts have been used as topics for grouping. In this study, we have proposed FNeTD (Frequent Nearer Terms of the Domain) method for PubMed abstracts clustering. This is achieved through a two-step process viz; i) identifying frequent words or phrases in the abstracts through the frequent multi-word extraction algorithm and ii) identifying nearer terms of the domain from the extracted frequent phrases using the nearest neighbors search. The efficiency of the clustering of PubMed abstracts using nearer terms of the domain was measured using F-score. The present study suggests that nearer terms of the domain can be used for clustering the search results.

 

Keywords

Domain knowledge, Nearer term, clustering, Nearest neighbors search, PubMed abstracts

 

Citation

David & Samuel, Bioinformation 8(1): 020-025 (2012)
 

Edited by

P Kangueane

 

ISSN

0973-2063

 

Publisher

Biomedical Informatics

 

License

This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.