HOME   |    PDF   |   


Title

Benchmarking of 16S rRNA gene databases using known strain sequences

Authors

Kunal Dixit1, Dimple Davray1, Diptaraj Chaudhari2, Pratik Kadam2, Rudresh Kshirsagar2, Yogesh Shouche2, Dhiraj Dhotre3,*, Sunil D. Saroj1,*

 

Affiliation

1Symbiosis School of Biological Sciences (SSBS), Symbiosis International (Deemed University), Pune, India; 2National Center for Microbial Resource (NCMR), National Center for Cell Science (NCCS), Pune, India; 3Reliance Life Sciences Pvt Ltd, Rabale, Mumbai, India; Corresponding author*

 

Email

Dhiraj Dhotre - Dhiraj.Dhotre@relbio.com, Sunil D. Saroj - sunil.saroj@ssbs.edu.in

 

Article Type

Research Article

 

Date

Received January 27, 2021; Revised March 10, 2021; Accepted March 10, 2021, Published March 31, 2021

 

Abstract

16S rRNA gene analysis is the most convenient and robust method for microbiome studies. Inaccurate taxonomic assignment of bacterial strains could have deleterious effects as all downstream analyses rely heavily on the accurate assessment of microbial taxonomy. The use of mock communities to check the reliability of the results has been suggested. However, often the mock communities used in most of the studies represent only a small fraction of taxa and are used mostly as validation of sequencing run to estimate sequencing artifacts. Moreover, a large number of databases and tools available for classification and taxonomic assignment of the 16S rRNA gene make it challenging to select the best-suited method for a particular dataset. In the present study, we used authentic and validly published 16S rRNA gene type strain sequences (full length, V3-V4 region) and analyzed them using a widely used QIIME pipeline along with different parameters of OTU clustering and QIIME compatible databases. Data Analysis Measures (DAM) revealed a high discrepancy in ratifying the taxonomy at different taxonomic hierarchies. Beta diversity analysis showed clear segregation of different DAMs. Limited differences were observed in reference data set analysis using partial (V3-V4) and full-length 16S rRNA gene sequences, which signify the reliability of partial 16S rRNA gene sequences in microbiome studies. Our analysis also highlights common discrepancies observed at varioustaxonomic levels using various methods and databases.

 

Keywords

16S rRNA gene; Genomic Databases; Taxonomic Discrepancy; QIIME.

 

Citation

Dixit et al. Bioinformation 17(3): 377-391 (2021)

 

Edited by

P Kangueane

 

ISSN

0973-2063

 

Publisher

Biomedical Informatics

 

License

This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.