|

Deep learning to predict the biosynthetic gene clusters in bacterial genomes.

Researchers

Journal

Modalities

Models

Abstract

Biosynthetic gene clusters (BGCs) in bacterial genomes code for important small molecules and secondary metabolites. Based on the validated BGCs and the corresponding sequences of protein family domains (Pfams), Pfam functions and clan information, we develop a deep learning method e-DeepBGC, that extends DeepBGC, for detecting the BGCs and their biosynthetic class in bacterial genomes. We show that e-DeepBGC leads to reduced false positive rates in BGC identification and an increased sensitivity in identifying BGCs compared to DeepBGC. We apply e-DeepBGC to 5,666 Ref Seq bacterial genomes and detect a total of 170, 685 BGCs with an average of 30.1 BGCs in each genome. We summarize all the predicted BGCs, their functional classes and the distributions of the BGCs in different bacterial phyla.Copyright © 2022 Elsevier Ltd. All rights reserved.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *