ProteinBERT: A universal deep-learning model of protein sequence and function.

Abstract

Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data. Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert.© The Author(s) (2022). Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].

Show Full Text

ProteinBERT: A universal deep-learning model of protein sequence and function.

Researchers

Journal

Modalities

Models

Abstract

Multiscale Rotated Bounding Box-Based Deep Learning Method for Detecting Ship Targets in Remote Sensing Images.

REDfold: accurate RNA secondary structure prediction using residual encoder-decoder network.

Delineating yeast cleavage and polyadenylation signals using deep learning.

Machine Learning-Based Automated Detection and Quantification of Geographic Atrophy and Hypertransmission Defects Using Spectral Domain Optical Coherence Tomography.

Overcoming Interpretability in Deep Learning Cancer Classification.

deepHPI: a comprehensive deep learning platform for accurate prediction and visualization of host-pathogen protein-protein interactions.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply