BACK TO CONTENTS   |    PDF   |    PREVIOUS   |    NEXT

Title

 

 

 

 

Protein sequence redundancy reduction: comparison of various methods 

Authors

Kresimir Sikic1, 2,*, Oliviero Carugo1,3

Affiliation

1Department of Structural and Computational Biology, Max F. Perutz Laboratories, Vienna University, 1030 Vienna, Austria; 2Department of Electronic Systems and Information Processing, Faculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, Croatia; 3Departement of General Chemistry, University of Pavia, I-27100 Pavia, Italy 

Email

kresimir.sikic@univie.ac.at

Phone

+431427752208

Article Type

Hypothesis

 

Date

Received October 20, 2010; Accepted November 11, 2010; Published November 27, 2010
 

Abstract

Non-redundant protein datasets are of utmost importance in bioinformatics. Constructing such datasets means removing protein sequences that overreach certain similarity thresholds. Several programs such as Decrease redundancy, cd-hit, Pisces, BlastClust and SkipRedundant are available. The issue that we focus on here is to what extent the non-redundant datasets produced by different programs are similar to each other. A systematic comparison of the features and of the outputs of these programs, by using subsets of the UniProt database, was performed and is described here. The results show high level of overlap between non-redundant datasets obtained with the same program fed with the same initial dataset but different percentage of identity threshold, and moderate levels of similarity between results obtained with different programs fed with the same initial dataset and the same percentage of identity threshold. We must be aware that some differences may arise and the use of more than one computer application is advisable.  

Keywords

 

protein sequence, removing redundancy, sequence alignment.

Citation

Kresimir & Oliviero, Bioinformation, 5(6): 234-239, 2010

Edited by

Martin Gollery

 

ISSN

0973-2063

 

Publisher

Biomedical Informatics

 

License

This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.