Biomarker Identification from RNA-Seq Data using a Robust Statistical Approach

BACK TO CONTENTS | PDF | PREVIOUS | NEXT

Title	Biomarker Identification from RNA-Seq Data using a Robust Statistical Approach
Authors	Zobaer Akond^{1, 2, 4,*}, Munirul Alam², Md. Nurul Haque Mollah³
Affiliation	¹Agricultural Statistics and Information & Communication Technology (ASICT) Division, Bangladesh Agricultural Research Institute (BARI), Joydebpur, Gazipur-1701, Bangladesh; ²Institute of Environmental Science, University of Rajshahi-6205, Bangladesh; ³Emerging Infections, Infectious Diseases Division, International Centre for Diarrheal Disease Research, Bangladesh (icddr,b); ⁴Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh;
Email	akond25@yahoo.com;
Article Type	Hypothesis
Date	Received March 5, 2018; Revised April 2, 2018; Accepted April 5, 2018; Published April 30, 2018
Abstract	Biomarker identification by differentially expressed genes (DEGs) using RNA-sequencing technology is an important task to characterize the transcriptomics data. This is possible with the advancement of next-generation sequencing technology (NGS). There are a number of statistical techniques to identify DEGs from high-dimensional RNA-seq count data with different groups or conditions such as edgeR, SAMSeq, voom-limma, etc. However, these methods produce high false positives and low accuracy in presence of outliers. We describe a robust t-statistic method to overcome these drawbacks using both simulated and real RNA-seq datasets. The model performance with 61.2%, 35.2%, 21.6%, 6.9%, 74.5%, 78.4%, 93.1%, 35.2% sensitivity, specificity, MER, FDR, AUC, ACC, PPV, and NPV, respectively ‍at 20% outliers is reported. We identified 409 DE genes with p-values<0.05 using robust t-test in HIV viremic vs avirmeic state real dataset. There are 28 up-regulated genes and 381 down-regulated genes estimated by log2 fold change (FC) approach at threshold value 1.5. The up-regulated genes form three clusters and it is found that 11 genes are highly associated in HIV1/AIDS. Protein-protein interaction (PPI) of up-regulated genes using STRING database found 21 genes with strong association among themselves. Thus, the identification of potential biomarkers from RNA-seq dataset using a robust t-statistical model is demonstrated.
Keywords	RNA-seq data, differentially expressed genes, robust t-statistic, gene-disease network, protein-protein interaction.
Citation	Akond et al. Bioinformation 14(4): 153-163 (2018)
Edited by	P Kangueane
ISSN	0973-2063
Publisher	Biomedical Informatics
License	This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.