HOME   |    PDF   |   


Title

Evaluation of machine learning classifiers for predicting essential genes in Mycobacterium tuberculosis strains

 

Authors

Monish Mukul Das1 & Keka Sarkar2∗

 

Affiliation

1Department of Computer Science & Engineering, University of Kalyani, Kalyani, Nadia – 741235; 2*Department of Microbiology, University of Kalyani, Kalyani, Nadia - 741235. Phone-+91-8334936391. *Corresponding author

 

Email

Monish Mukul Das - Email: monishmicro22@klyuniv.ac.in

Keka Sarkar - Email: keka@klyuniv.ac.in

 

Article Type

Research Article

 

Date

Received November 1, 2022; Revised December 20, 2022; Accepted December 31, 2022, Published December 31, 2022

 

Abstract

Accurate investigation and prediction of essential genes from bacterial genome is very important as it might be explored in effective targets for antimicrobial drugs and understanding biological mechanism of a cell. A subset of key features data obtained from 14 genome sequence-based features of 20 strains of Mycobacterium tuberculosis bacteria whose essential gene information was downloaded from ePath and NCBI database for mapping and matching essential genes by using a genome extraction program. The selection of key features was performed by using Genetic Algorithm. For each of three classifiers, 80%, 10% and 10% of subset key features were used for training, validation and testing, respectively. Experimental results (10-f-cv) illustrated that DNN (proposed), DT, and SVM achieved AUC of 0.98, 0.88 and 0.82, respectively. DNN (proposed) outperformed DT and SVM. The higher prediction accuracy of classifiers was observed because of using only key features which also justified better generalizability of classifiers and efficiency of key features related to gene essentiality. Besides, DNN (proposed) also showed best prediction performance while compared with other predictors used in previous studies. The genome extraction program was developed for mapping and matching of essential genes between ePath and NCBI database.

 

Keywords

Essential gene, Mycobacterium tuberculosis, Genome extraction program, Deep Neural Networks (DNN), Support Vector Machine (SVM), Decision Tree (DT), Genetic Algorithm, Area under the Receiver operating characteristics curve (AUC).

 

Citation

Das & Sarkar, Bioinformation 18(12): 1126-1130 (2022)

 

Edited by

P Kangueane

 

ISSN

0973-2063

 

Publisher

Biomedical Informatics

 

License

This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.