Use of a neural network to predict normalized signal strengths from a DNA-sequencing microarray

BACK TO CONTENTS | PDF | PREVIOUS | NEXT

Title	Use of a neural network to predict normalized signal strengths from a DNA-sequencing microarray
Authors	Charles Chilaka^{1, 5}, Steven Carr^{2, 3, *}, Nabil Shalaby^{3, 4}, Wolfgang Banzhaf^{3, 6}
Affiliation	¹Program in Scientific Computing; ²Department of Biology; ³Department of Computer Science; ⁴Department of Mathematics and Statistics Memorial University of Newfoundland; St. John’s, Newfoundland, Canada A1C 5S7; ⁵Department of Mathematics, FUT, Owerri, Nigeria ⁶Present address: Department of Computer Science and Engineering, Michigan State University, East Lansing MI 48824.
Email	scarr@mun.ca
Article Type	Hypothesis
Date	Received July 13, 2017; Accepted July 18, 2017; Published September 30, 2017
Abstract	A microarray DNA sequencing experiment for a molecule of N bases produces a 4xN data matrix, where for each of the N positions each quartet comprises the signal strength of binding of an experimental DNA to a reference oligonucleotide affixed to the microarray, for the four possible bases (A, C, G, or T). The strongest signal in each quartet should result from a perfect complementary match between experimental and reference DNA sequence, and therefore indicate the correct base call at that position. The linear series of calls should constitute the DNA sequence. Variation in the absolute and relative signal strengths, due to variable base composition and other factors over the N quartets, can interfere with the accuracy and (or) confidence of base calls in ways that are not fully understood. We used a feed-forward back-propagation neural network model to predict normalized signal intensities of a microarray-derived DNA sequence of N = 15,453 bases. The DNA sequence was encoded as n-gram neural input vectors, where n = 1, 2, and their composite. The data were divided into training, validation, and testing sets. Regression values were >99% overall, and improved with increased number of neurons in the hidden layer, and in the composition n-grams. We also noticed a very low mean square error overall which transforms to a high performance value.
Keywords
Citation	Chilaka et al. Bioinformation 13(9): 313-317 (2017)
Edited by	P Kangueane
ISSN	0973-2063
Publisher	Biomedical Informatics
License	This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.

="margin-top: 0; margin-bottom: 0">