BACK TO CONTENTS   |    PDF   |    PREVIOUS   |    NEXT

Title

A graph-based clustering method applied to protein sequences

 

Authors

Pooja Mishra1* & Paras Nath Pandey2

 

Affiliation

1Center of Bioinformatics, University of Allahabad, Allahabad, India, 2Department of Mathematics, University of Allahabad, Allahabad, India

 

Email

pooja.mishra0806@gmail.com; *Corresponding author

 

Phone

+91-9452377426

 

Article Type

Hypothesis

 

Date

Received July 09, 2011; Accepted July 12, 2011; Published August 02, 2011

 

Abstract

The number of amino acid sequences is increasing very rapidly in the protein databases like Swiss-Prot, Uniprot, PIR and others, but the structure of only some amino acid sequences are found in the Protein Data Bank. Thus, an important problem in genomics is automatically clustering homologous protein sequences when only sequence information is available. Here, we use graph theoretic techniques for clustering amino acid sequences. A similarity graph is defined and clusters in that graph correspond to connected subgraphs. Cluster analysis seeks grouping of amino acid sequences into subsets based on distance or similarity score between pairs of sequences. Our goal is to find disjoint subsets, called clusters, such that two criteria are satisfied: homogeneity: sequences in the same cluster are highly similar to each other; and separation: sequences in different clusters have low similarity to each other. We tested our method on several subsets of SCOP (Structural Classification of proteins) database, a gold standard for protein structure classification. The results show that for a given set of proteins the number of clusters we obtained is close to the superfamilies in that set; there are fewer singeltons; and the method correctly groups most remote homologs.

 

Keywords

Clustering, protein sequences, graph-theoretic approach.

 

Citation

Mishra & Pandey. Bioinformation 6(10): 372-374 (2011)
 

Edited by

P Kangueane

 

ISSN

0973-2063

 

Publisher

Biomedical Informatics

 

License

This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.