BACK TO CONTENTS   |    PDF   |    PREVIOUS   |    NEXT

Title

Optimizing k-mer size using a variant grid search to enhance de novo genome assembly

 

Authors

SoyeonCha & David McKBird*

 

Affiliation

Bioinformatics Research Center and Department of Plant Pathology, NC State University, Raleigh, NC, USA;

 

Email

David McK Bird - Email: bird@ncsu.edu;*Corresponding author

Article Type

Prediction Model

Date

Received March 18, 2016; Revised April 6, 2016; Accepted April 6, 2016; Published April 10, 2016

 

Abstract

Largely driven by huge reductions in per-base costs, sequencing nucleic acids has become a near-ubiquitous technique in laboratories performing biological and biomedical research. Most of the effort goes to re-sequencing, but assembly of de novo-generated, raw sequence reads into contigs that span as much of the genome as possible is central to many projects. Although truly complete coverage is not realistically attainable, maximizing the amount of sequence that can be correctly assembled into contigs contributes to coverage. Here we compare three commonly used assembly algorithms (ABySS, Velvet and SOAPdenovo2), and show that empirical optimization of k-mer values has a disproportionate influence on de novo assembly of a eukaryotic genome, the nematode parasite Meloidogynechitwoodi. Each assembler was challenged with ~40 million Iluumina II paired-end reads, and assemblies performed under a range of k-mer sizes. In each instance, the optimal k-mer was 127, although based on N50 values,ABySS was more efficient than the others. That the assembly was not spurious was established using the “Core Eukaryotic Gene Mapping Approach”, which indicated that 98.79% of the M. chitwoodi genome was accounted for by the assembly. Subsequent gene finding and annotation are consistent with this and suggest that k-mer optimization contributes to the robustness of assembly.

 

Keywords

grid search, assembly, genome, model

Citation

Cha & Bird, Bioinformation 12(2): 36-40 (2016)
 

Edited by

P Kangueane

 

ISSN

0973-2063

 

Publisher

Biomedical Informatics

 

License

This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.