Computational identification of putative miRNAs and their target genes in pathogenic amoeba Naegleria fowleri

Naegleria fowleri is a parasitic unicellular free living eukaryotic amoeba. The parasite spreads through contaminated water and causes primary amoebic meningoencephalitis (PAM). Therefore, it is of interest to understand its molecular pathogenesis. Hence, we analyzed the parasite genome for miRNAs (microRNAs) that are non-coding, single stranded RNA molecules. We identified 245 miRNAs using computational methods in N. fowleri, of which five miRNAs are conserved. The predicted miRNA targets were analyzed by using miRanda (software) and further studied the functions by subsequently annotating using AmiGo (a gene ontology web tool).


Identification of conserved miRNA
To identify the conserved miRNA candidates in N. fowleri, we retrieved all mature miRNA sequences from miRBase Release 21 (http://www.mirbase.org/) [30]. A BlastN search was performed for all mature miRNA sequences from miRBase against 245 precursor miRNA of N. fowleri with default parameters and e-value 0.001. The results of blastN were further analyzed by applying two conditions: (1) more than 90% identity between the sequences and (2) mismatches between the sequences not more than three bases Table 1

Prediction of miRNA targets
We used miRanda software version 3.3a released in 2010 to identify the target genes in N. fowleri. The following parameters were used in these studies; match with minimum threshold score of 120; target mRNA duplex with minimum folding free energy threshold -20kcal/mol; gap opening penalty -8; gap extension penalty -2; scaling parameter 4 for complementary nucleotide match score. The output of results were analyzed by using following conditions as (1) Counting nucleotide position started from 5' end of miRNA; (2) the sequence from two to eight nucleotides having complete complementary region with respective mRNA (miRNA: mRNA) not more than two mismatch is allowed; (3) no mismatch from position 2 to 4 (5' end miRNA); (4) not more than one gap is allowed in the alignment; (5) not more than one G-U pairing allowed in the seed region 6) at least 18 nucleotide alignment between complementary region with respective mRNA (miRNA: mRNA).

Gene ontology of target genes
For better understanding of target gene functions and metabolic role in N. fowleri, the target genes were subjected to BLAST in AmiGo version 1.8 (http://amigo1. geneontology. org/cgi-bin/amigo/go.cgi), the annotated results of target genes based on sequence similarity against NCBI and SwissProt database. For each target gene best hit were selected. Gene ontology was classified into biological processes, molecular functions and cellular components with the GO terms at AmiGo.

Results & Discussion:
Computational prediction of miRNA in N. fowleri Different computational approaches are used to identify miRNA in both animal and plants [16, 17, & 25]. In this study, we used existing methodology with slight modifications [21]. The collected DNA contigs of N. fowleri from NCBI were scanned in Einverted EMBOSS software. The resulted output hits of Einverted EMBOSS software is 54896, this contain multiloop structures and gap formed in hairpin like structures. Einverted program inverted the sequences that can form reverse complementary sequence to form hairpin loop like structure which contain mismatches and bulges in the stem loop part. To minimize the gaps and multi-loop structures, we applied the condition by writing the Java script to retrieve the duplexes greater than or equal to (≥) 15 base pair in length and separated by less than or equal to (≤) 40 nucleotides, this would narrow the result to 12821. Since the miRNA length varies [31], we have collected the sequences of 60-120 nucleotides length which come to 8603. The minimal folding free energy (MFE) plays an important role to determine the secondary structure of RNA. We identified the secondary structure of RNA by using the program called RNA fold, this software works based on algorithm called Vienna RNA package [32]. Individual miRNA secondary structures were calculated by using MFE which is available in RNA fold software. miRNA sequences have lower folding free energy than that of shuffled sequences, this characteristic of miRNA allows the formation of stable secondary structure [33].
However MFE values also depend on length of the RNA. The free energy range considered in this study would be less than or equal to -20 to -40 kcal/mol. The following requirements should satisfy to select precursor miRNA candidates: (1) RNA sequence fold with appropriate structure to form hair pin like secondary structure; (2) the mature miRNA sequence is present in any one of the arm; (3) not more than five mismatches between predicted mature miRNA and the opposite miRNA* sequence in the hairpin structure; (4) no breakage or loop in mature miRNA and miRNA* sequences; (5) at least 16 base pair should be present in miRNA stem loop structure. Manually, we analyzed each miRNA candidate to satisfy the above criteria and were retained.
We further narrowed down the miRNA candidates by filtering the GC content. The overall GC content of N. fowleri genome is 37%, in our analyses we retained GC content of 30-60% which reduced the data to 2201. We ran BlastX 2.2.30 in order to remove the protein coding gene in N. fowleri which narrow down the data to 1468. Then, according to miPred software we characterized the pre-miRNA candidates to identify the real miRNA and to remove pseudo miRNA and not real precursor miRNA with the confidence greater than and equal to 70%. After performing miPred software out of 1468 only 288 miRNA were retained. However, few miRNA copies in N. fowleri genome sequences found to be repeated, two to four copies present in same contigs ID, are eliminated. Remaining miRNA data were further analyzed to remove repeated elements in the sequences by using the tool called RepeatMasker. The software searches for repetitive sequence by aligning input genomic sequence against repbase by performing alignment program with cross_match as search engine [28]. Dataset decreases to 246 miRNA candidates. Finally, MatureBayes tool (http:// mirna.imbb.forth.gr/MatureBayes.html) is used to identify mature miRNA sequence in miRNA precursor [29]. Out of 246 miRNAs one miRNA was not able form mature sequence which reduced the data to 245 novel miRNAs.

Characterization of N. fowleri miRNA
Previous reports have shown that some of the features of pre-miRNA help us to identify conserved and non-conserved miRNAs; miRNA are conserved during the evolution from worm to human based on this study. We collected all the published mature miRNAs from miRBase [34] by performing BlastN search using BioEdit software [35] with lower e-value (0.001). Out of 245 five miRNAs were conserved. Gap in the alignment is due to insertion or deletion of the nucleotides. Alignments were performed using both the strand of 3' stem and 5' stem mature miRNA/miRNA* of all the putative novel miRNA sequences. Hits were collected based on the alignment having fewer than three mismatches between the sequences. Separated alignments are studied for the mature sequences in 5' and 3' stem (Figure 2). In our study, we were able to identify some of the features of precursor miRNA, such as: (a) MFE, range from -20 to -40.90 kcal/mol, respectively Table 2 (Available with authors) (b) GC content (c) minimal free energy of the thermodynamics ensemble (d) adjusted minimal free energy (AMFE). AMFE is calculated by (MEF/Length of RNA sequence) ×100; minimal free energy index (MFEI) calculated by AMFE/ (G+C) % all these features are calculated for each miRNA candidates. MFEI is an important criterion to distinguish miRNA from other types of RNA (coding and noncoding). For majority of precursor miRNAs MFEI was more than 0.85 with an average of 0.97 for other types of RNA like tRNA (0.64), rRNA (0.59) and mRNA (0.65) [36]. Along with these, we also calculated the percentage of individual nucleotides A%, G%, C% and U% in the precursor miRNA candidates (Table 2 -available with authors). In miRNA the percentage of G and C is less than the A and U [36]. Previous studies, shows that U is the predominant nucleotide presented at first position of 5' in majority of mature miRNAs [26] which follows the same for N. fowleri. Out of 245 novel miRNA 44.25% have U nucleotide in first position, only 33.2%, 14.6% and 7.75% have first position of nucleotide at A, C and G, respectively. These suggest that position of nucleotide in miRNA play an important role in identifying the mature sequences and binding the site in target mRNA. In our study to identify the target in N. fowleri, we used miRanda software version 3.3a. A set of parameters was followed as described in the Methodology. From the resulted outputs, we achieved only 30 target genes after screening the condition which is described in method. The identified target genes play a vital role in various biological activities especially in mitochondria as a component of the respiratory chain, oxidation-reduction process, electron transport chain and apoptosis (Table 1 in supplementary material). All these activities play a key role in cellular growth and development.

Target Gene annotation
To understand the miRNA target genes in N. fowleri, individual target genes sequences were analyzed using AmiGo Software version 1.8 (http://amigo1.geneontology.org/cgi-bin/amigo/ go.cgi). The predicted result showed that most of the target genes are involved in oxidation-reduction process, dehydrogenase activity, electron transport chain in mitochondria etc., this suggests that most of the target genes involved in mitochondria biological processes Table 3 (see supplementary material).

Conclusion:
N. fowleri is a free living amoeba that acts as a human pathogen causing PAM. Therefore, it is of interest to understand its molecular pathogenesis using miRNA. Hence, we report 245 predicted miRNAs from N. fowleri. Out of that five miRNAs show high homology with mature sequences in the miRBase. Predicted miRNA sequences shows that U is the predominant nucleotide present in the precursor and mature sequences at the 5' end; this is one of the features of miRNA. Gene annotation shows that target gene functions are mainly involved in mitochondrial regulation. This data provides insights to design experimental approach for understanding regulatory mechanism.