Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models.

Researchers

Journal

International journal of molecular sciences

Modalities

Models

ESM-2 embeddings GPT-2 PLM PSSM features

Abstract

Identification of druggable proteins can greatly reduce the cost of discovering new potential drugs. Traditional experimental approaches to exploring these proteins are often costly, slow, and labor-intensive, making them impractical for large-scale research. In response, recent decades have seen a rise in computational methods. These alternatives support drug discovery by creating advanced predictive models. In this study, we proposed a fast and precise classifier for the identification of druggable proteins using a protein language model (PLM) with fine-tuned evolutionary scale modeling 2 (ESM-2) embeddings, achieving 95.11% accuracy on the benchmark dataset. Furthermore, we made a careful comparison to examine the predictive abilities of ESM-2 embeddings and position-specific scoring matrix (PSSM) features by using the same classifiers. The results suggest that ESM-2 embeddings outperformed PSSM features in terms of accuracy and efficiency. Recognizing the potential of language models, we also developed an end-to-end model based on the generative pre-trained transformers 2 (GPT-2) with modifications. To our knowledge, this is the first time a large language model (LLM) GPT-2 has been deployed for the recognition of druggable proteins. Additionally, a more up-to-date dataset, known as Pharos, was adopted to further validate the performance of the proposed model.

Show Full Text

Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models.

Researchers

Journal

Modalities

Models

Abstract

GPS-PBS: A Deep Learning Framework to Predict Phosphorylation Sites that Specifically Interact with Phosphoprotein-Binding Domains.

Deep multiple-instance learning accurately predicts gene haploinsufficiency and deletion pathogenicity.

Mapping single-cell developmental potential in health and disease with interpretable deep learning.

What’s new in IBD therapy: an “omics network” approach.

Identification of Zinc-Binding Inhibitors of Matrix Metalloproteinase-9 to Prevent Cancer Through Deep Learning and Molecular Dynamics Simulation Approach.

Multi-Modality Deep Infarct: Non-invasive identification of infarcted myocardium using composite in-silico-human data learning.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply