BACK TO CONTENTS   |    PDF   |    PREVIOUS   |    NEXT

Title

 

 

 

 

 

A data-mining approach for multiple structural alignment of proteins

 

Authors

 

Wing-Yan Siu1, Nikos Mamoulis1, Siu-Ming Yiu1,*Ho-Leung Chan1

Affiliation

 

1Department of Computer Science, the University of Hong Kong, Pokfulam Road, Hong Kong, China

 

Email

 

smyiu@cs.hku.hk

 

Article Type

 

Hypothesis

Date

 

Received May 25, 2009; Revised December 31, 2009; Accepted February 24, 2010; Published February 28, 2010

 

Abstract

Comparing the 3D structures of proteins is an important but computationally hard problem in bioinformatics. In this paper, we propose studying the problem when much less information or assumptio ns are available. We model the structural alignment of proteins as a combinatorial problem. In the problem, each protein is simply a set of points in the 3D space, without sequence order information, and the objective is to discover all large enough alignments for any subset of the input. We propose a data-mining approach for this problem. We first perform geometric hashing of the structures such that points with similar locations in the 3D space are hashed into the same bin in the hash table. The novelty is that we consider each bin as a coincidence group and mine for frequent patterns, which is a well-studied technique in data mining. We observe that these frequent patterns are already potentially large alignments. Then a simple heuristic is used to extend the alignments if possible. We implemented the algorithm and tested it using real protein structures. The results were compared with existing tools. They showed that the algorithm is capable of finding conserved substructures that do not preserve sequence order, especially those existing in protein interfaces. The algorithm can also identify conserved substructures of functionally similar structures within a mixture with dissimilar ones. The running time of the program was smaller or comparable to that of the existing tools.

 

Keywords

structural comparisons; proteins; multiple alignment

Citation

 

Siu et al, Bioinformation 4(8): 366-370 (2010)

Edited by

 

P. Kangueane

 

ISSN

 

0973-2063

 

Publisher

 

Biomedical Informatics

License

 

 

This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.