Comparison of methods for identifying differentially expressed genes across multiple conditions from microarray data



Yuande Tan1 & Yin Liu2*



1School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, United States of America; 2Department of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, Texas, United States of America


Email; *Corresponding author


Article Type




Selected publications from Asia Pacific Bioinformatics Network (APBioNet) 10th International Conference on Bioinformatics (InCoB 2011), Malaysia, November 30 to December 02, 2011



Identification of genes differentially expressed across multiple conditions has become an important statistical problem in analyzing large-scale microarray data. Many statistical methods have been developed to address the challenging problem. Therefore, an extensive comparison among these statistical methods is extremely important for experimental scientists to choose a valid method for their data analysis. In this study, we conducted simulation studies to compare six statistical methods: the Bonferroni (B-) procedure, the Benjamini and Hochberg (BH-) procedure, the Local false discovery rate (Localfdr) method, the Optimal Discovery Procedure (ODP), the Ranking Analysis of F-statistics (RAF), and the Significant Analysis of Microarray data (SAM) in identifying differentially expressed genes. We demonstrated that the strength of treatment effect, the sample size, proportion of differentially expressed genes and variance of gene expression will significantly affect the performance of different methods. The simulated results show that ODP exhibits an extremely high power in indentifying differentially expressed genes, but significantly underestimates the False Discovery Rate (FDR) in all different data scenarios. The SAM has poor performance when the sample size is small, but is among the best-performing methods when the sample size is large. The B-procedure is stringent and thus has a low power in all data scenarios. Localfdr and RAF show comparable statistical behaviors with the BH-procedure with favorable power and conservativeness of FDR estimation. RAF performs the best when proportion of differentially expressed genes is small and treatment effect is weak, but Localfdr is better than RAF when proportion of differentially expressed genes is large.



Tan & Liu . Bioinformation 7(8): 400-404 (2011)

Edited by

TW Tan






Biomedical Informatics



This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.