|

FP2VEC: a new molecular featurizer for learning molecular properties.

Researchers

Journal

Modalities

Models

Abstract

One of the most successful methods for predicting the properties of chemical compounds is the quantitative structure-activity relationship (QSAR) methods. The prediction accuracy of QSAR models has recently been greatly improved by employing deep learning technology. Especially, newly developed molecular featurizers based on graph convolution operations on molecular graphs significantly outperform the conventional extended connectivity fingerprints (ECFP) feature in both classification and regression tasks, indicating that it is critical to develop more effective new featurizers to fully realize the power of deep learning techniques. Motivated by the fact that there is a clear analogy between chemical compounds and natural languages, this work develops a new molecular featurizer, FP2VEC, which represents a chemical compound as a set of trainable embedding vectors.
To implement and test our new featurizer, we build a QSAR model using a simple convolutional neural network (CNN) architecture that has been successfully used for natural language processing tasks such as sentence classification task. By testing our new method on several benchmark datasets, we demonstrate that the combination of FP2VEC and CNN model can achieve competitive results in many QSAR tasks, especially in classification tasks. We also demonstrate that the FP2VEC model is especially effective for multi-task learning.
FP2VEC is available from https://github.com/wsjeon92/FP2VEC.
Supplementary data are available at Bioinformatics online.
© The Author(s) (2019). Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *