Title |
Biomarker Identification from RNA-Seq Data using a Robust Statistical Approach
|
Authors |
Zobaer Akond1, 2, 4,*, Munirul Alam2, Md. Nurul Haque Mollah3
|
Affiliation |
1Agricultural Statistics and Information & Communication Technology (ASICT) Division, Bangladesh Agricultural Research Institute (BARI), Joydebpur, Gazipur-1701, Bangladesh; 2Institute of Environmental Science, University of Rajshahi-6205, Bangladesh; 3Emerging Infections, Infectious Diseases Division, International Centre for Diarrheal Disease Research, Bangladesh (icddr,b); 4Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh;
|
|
|
Article Type |
Hypothesis
|
Date |
Received March 5, 2018; Revised April 2, 2018; Accepted April 5, 2018; Published April 30, 2018
|
Abstract |
Biomarker identification by differentially expressed genes (DEGs) using RNA-sequencing technology is an important task to characterize the transcriptomics data. This is possible with the advancement of next-generation sequencing technology (NGS). There are a number of statistical techniques to identify DEGs from high-dimensional RNA-seq count data with different groups or conditions such as edgeR, SAMSeq, voom-limma, etc. However, these methods produce high false positives and low accuracy in presence of outliers. We describe a robust t-statistic method to overcome these drawbacks using both simulated and real RNA-seq datasets. The model performance with 61.2%, 35.2%, 21.6%, 6.9%, 74.5%, 78.4%, 93.1%, 35.2% sensitivity, specificity, MER, FDR, AUC, ACC, PPV, and NPV, respectively at 20% outliers is reported. We identified 409 DE genes with p-values<0.05 using robust t-test in HIV viremic vs avirmeic state real dataset. There are 28 up-regulated genes and 381 down-regulated genes estimated by log2 fold change (FC) approach at threshold value 1.5. The up-regulated genes form three clusters and it is found that 11 genes are highly associated in HIV1/AIDS. Protein-protein interaction (PPI) of up-regulated genes using STRING database found 21 genes with strong association among themselves. Thus, the identification of potential biomarkers from RNA-seq dataset using a robust t-statistical model is demonstrated.
|
Keywords |
RNA-seq data, differentially expressed genes, robust t-statistic, gene-disease network, protein-protein interaction.
|
Citation |
Akond et al. Bioinformation 14(4): 153-163 (2018)
|
Edited by |
P Kangueane
|
ISSN |
0973-2063
|
Publisher |
|
License |
This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.
|