| |

Virtifier: A deep learning-based identifier for viral sequences from metagenomes.

Researchers

Journal

Modalities

Models

Abstract

Viruses, the most abundant biological entities on earth, are important components of microbial communities, and as major human pathogens, they are responsible for human mortality and morbidity. The identification of viral sequences from metagenomes is critical for viral analysis. As massive quantities of short sequences are generated by next-generation sequencing (NGS), most methods utilize discrete and sparse one-hot vectors to encode nucleotide sequences, which are usually ineffective in viral identification.In this paper, Virtifier, a deep learning-based viral identifier for sequences from metagenomic data, is proposed. It includes a meaningful nucleotide sequence encoding method named Seq2Vec and a variant viral sequence predictor with an attention-based Long Short-Term Memory (LSTM) network. By utilizing a fully trained embedding matrix to encode codons, Seq2Vec can efficiently extract the relationships among those codons in a nucleotide sequence. Combined with an attention layer, the LSTM neural network can further analyze the codon relationships and sift the parts that contribute to the final features. Experimental results of three datasets have shown that Virtifier can accurately identify short viral sequences (< 500 bp) from metagenomes, surpassing three widely used methods, VirFinder, DeepVirFinder and PPR-Meta. Meanwhile, a comparable performance was achieved by Virtifier at longer lengths (> 5,000bp).A Python implementation of Virtifier and the Python code developed for this study have been provided on Github https://github.com/crazyinter/Seq2Vec.Supplementary data are available at Bioinformatics online.© The Author(s) (2021). Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *