Other

A novel sequence alignment algorithm based on deep learning of the protein folding code.

September 22, 2020 Other

Researchers

Jeffrey Skolnick Mu Gao

Journal

Bioinformatics (Oxford, England)

Modalities

Models

deep learning

Abstract

From evolutionary interference, function annotation to structural prediction, protein sequence comparison has provided crucial biological insights. While many sequence alignment algorithms have been developed, existing approaches often cannot detect hidden structural relationships in the “twilight zone” of low sequence identity. To address this critical problem, we introduce a computational algorithm that performs protein Sequence Alignments from deep-Learning of Structural Alignments (SAdLSA, silent “d”). The key idea is to implicitly learn the protein folding code from many thousands of structural alignments using experimentally determined protein structures.
To demonstrate that the folding code was learned, we first show that SAdLSA trained on pure α-helical proteins successfully recognizes pairs of structurally related pure β-sheet protein domains. Subsequent training and benchmarking on larger, highly challenging data sets show significant improvement over established approaches. For challenging cases, SAdLSA is ∼150% better than HHsearch for generating pairwise alignments and ∼50% better for identifying the proteins with the best alignments in a sequence library. The time complexity of SAdLSA is O(N) thanks to GPU acceleration.
Data sets and source codes of SAdLSA are available free of charge for academic users at http://pwp.gatech.edu/cssb/sadlsa/.
Supplementary data are available at Bioinformatics online.
© The Author(s) (2020). Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].

Show Full Text

A novel sequence alignment algorithm based on deep learning of the protein folding code.

Researchers

Journal

Modalities

Models

Abstract

FiNuTyper: design and validation of an automated deep learning-based platform for simultaneous fiber and nucleus type analysis in human skeletal muscle.

Investigating Deep Learning based Breast Cancer Subtyping using Pan-cancer and Multi-omic Data.

High-resolution ISAR imaging based on photonic receiving for high-accuracy automatic target recognition.

Rice Ear Counting Based on Image Segmentation and Establishment of a Dataset.

Deep learning-based methods for classification of microsatellite instability in endometrial cancer from HE-stained pathological images.

Simultaneous Sleep Stage and Sleep Disorder Detection from Multimodal Sensors Using Deep Learning.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply