INTransformer: Data augmentation-based contrastive learning by injecting noise into transformer for molecular property prediction.

January 16, 2024 Drug Discovery, Molecular Biology

Researchers

Jing Jiang Ruisheng Zhang Yachao Li Yunwu Liu

Journal

Journal of molecular graphics & modelling

Modalities

Models

Transformer

Abstract

Molecular property prediction plays an essential role in drug discovery for identifying the candidate molecules with target properties. Deep learning models usually require sufficient labeled data to train good prediction models. However, the size of labeled data is usually small for molecular property prediction, which brings great challenges to deep learning-based molecular property prediction methods. Furthermore, the global information of molecules is critical for predicting molecular properties. Therefore, we propose INTransformer for molecular property prediction, which is a data augmentation method via contrastive learning to alleviate the limitations of the labeled molecular data while enhancing the ability to capture global information. Specifically, INTransformer consists of two identical Transformer sub-encoders to extract the molecular representation from the original SMILES and noisy SMILES respectively, while achieving the goal of data augmentation. To reduce the influence of noise, we use contrastive learning to ensure the molecular encoding of noisy SMILES is consistent with that of the original input so that the molecular representation information can be better extracted by INTransformer. Experiments on various benchmark datasets show that INTransformer achieved competitive performance for molecular property prediction tasks compared with the baselines and state-of-the-art methods.Copyright © 2024 Elsevier Inc. All rights reserved.

Show Full Text

INTransformer: Data augmentation-based contrastive learning by injecting noise into transformer for molecular property prediction.

Researchers

Journal

Modalities

Models

Abstract

Differential Data Augmentation Techniques for Medical Imaging Classification Tasks.

A novel framework integrating AI model and enzymological experiments promotes identification of SARS-CoV-2 3CL protease inhibitors and activity-based probe.

Human 5′ UTR design and variant effect prediction from a massively parallel translation assay.

Improving the generalizability of protein-ligand binding predictions with AI-Bind.

PortPred: Exploiting deep learning embeddings of amino acid sequences for the identification of transporter proteins and their substrates.

MEDUSA: Prediction of Protein Flexibility from Sequence.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply