KEGG orthology prediction of bacterial proteins using natural language processing.

April 10, 2024 Bioinformatics, Computational Biology

Researchers

Haoyu Wu Jing Chen Ning Wang

Journal

BMC bioinformatics

Modalities

Models

deep learning Natural Language Processing

Abstract

The advent of high-throughput technologies has led to an exponential increase in uncharacterized bacterial protein sequences, surpassing the capacity of manual curation. A large number of bacterial protein sequences remain unannotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology, making it necessary to use auto annotation tools. These tools are now indispensable in the biological research landscape, bridging the gap between the vastness of unannotated sequences and meaningful biological insights.In this work, we propose a novel pipeline for KEGG orthology annotation of bacterial protein sequences that uses natural language processing and deep learning. To assess the effectiveness of our pipeline, we conducted evaluations using the genomes of two randomly selected species from the KEGG database. In our evaluation, we obtain competitive results on precision, recall, and F1 score, with values of 0.948, 0.947, and 0.947, respectively.Our experimental results suggest that our pipeline demonstrates performance comparable to traditional methods and excels in identifying distant relatives with low sequence identity. This demonstrates the potential of our pipeline to significantly improve the accuracy and comprehensiveness of KEGG orthology annotation, thereby advancing our understanding of functional relationships within biological systems.© 2024. The Author(s).

Show Full Text

KEGG orthology prediction of bacterial proteins using natural language processing.

Researchers

Journal

Modalities

Models

Abstract

Reducing Peptide Sequence Bias in Quantitative Mass Spectrometry Data with Machine Learning.

Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences.

Classification of EEG signals related to real and imagery knee movements using deep learning for brain computer interfaces.

Deep learning-assisted ultra-fast/low-dose whole-body PET/CT imaging.

Deep-learning-based renal artery stenosis diagnosis via multimodal fusion.

Deep Learning of Cancer Stem Cell Morphology Using Conditional Generative Adversarial Networks.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply