BACK TO CONTENTS   |    PDF   |    PREVIOUS   |   

Title

optCluster: An R Package for Determining the Optimal Clustering Algorithm

 

Authors

Michael Sekula1, Somnath Datta2, Susmita Datta2,*

 

Affiliation

1Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, Kentucky, 40202, USA; 2Department of Biostatistics, University of Florida, Gainesville, Florida, 32611, USA;

 

Email

susmita.datta@ufl.edu

 

Article Type

Software

 

Date

Received February 27, 2017; Revised March 10, 2017; Accepted March 11, 2017; Published March 31, 2017

 

Abstract

There exist numerous programs and packages that perform validation for a given clustering solution; however, clustering algorithms fare differently as judged by different validation measures. If more than one performance measure is used to evaluate multiple clustering partitions, an optimal result is often difficult to determine by visual inspection alone. This paper introduces optCluster, an R package that uses a single function to simultaneously compare numerous clustering partitions (created by different algorithms and/or numbers of clusters) and obtain a “best” option for a given dataset. The method of weighted rank aggregation is utilized by this
package to objectively aggregate various performance measure scores, thereby taking away the guesswork that often follows a visual inspection of cluster results. The optCluster package contains biological validation measures as well as clustering algorithms developed specifically for RNA sequencing data, making it a useful tool for clustering genomic data.

 

Keywords

Clustering; Validation; Gene Expression; RNA-Seq

 

Citation

Sekula et al. Bioinformation 13(3): 100-103 (2017)

 

Edited by

P Kangueane

 

ISSN

0973-2063

 

Publisher

Biomedical Informatics

 

License

This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.