Deep Learning-Based Imbalanced Data Classification for Drug Discovery.

June 23, 2020 Machine Learning, Pharmacology

Researchers

Selçuk Korkmaz

Journal

Journal of chemical information and modeling

Modalities

Models

Deep Neural Networks (DNN)

Abstract

Drug discovery studies have become increasingly expensive and time-consuming processes. In the early phase of drug discovery studies, an extensive search has been performed to find drug-like compounds, which then can be optimized over time to become a marketed drug. One of the conventional ways of detecting active compounds is to perform an HTS (high-throughput screening) experiment. As of July 2019, the PubChem repository contains 1.3 million bioassays that are generated through HTS experiments. This feature of the PubChem makes it a great resource for performing machine learning algorithms to develop classification models to detect active compounds for drug discovery studies. However, datasets obtained from the PubChem are highly imbalanced. This imbalance nature of the datasets has a negative impact on the classification performance of machine learning algorithms. Here, we explored the classification performance of deep neural networks (DNN) on imbalance compound datasets after applying various data balancing methods. We used five confirmatory HTS bioassays from the PubChem repository and applied one under-sampling and three over-sampling methods as data balancing methods. We used a fully connected, two-hidden-layer DNN model for the classification of active and inactive molecules. To evaluate the performance of the network, we calculated six performance metrics, including balanced accuracy, precision, recall, F1 score, Matthews correlation coefficient, and area under the ROC curve. The study results showed that the effect of imbalanced data on network performance could be mitigated to a degree by applying the data balancing methods. The level of imbalancedness, however, has a negative effect on the performance of the network.

Show Full Text

Deep Learning-Based Imbalanced Data Classification for Drug Discovery.

Researchers

Journal

Modalities

Models

Abstract

Deep learning on time series laboratory test results from electronic health records for early detection of pancreatic cancer.

Frequency Stability Prediction of Power Systems Using Vision Transformer and Copula Entropy.

[Deep learning approach for automatic segmentation of auricular acupoint divisions].

Deep Learning Automation of Kidney, Liver, and Spleen Segmentation for Organ Volume Measurements in Autosomal Dominant Polycystic Kidney Disease.

Decoding protein binding landscape on circular RNAs with base-resolution transformer models.

GhoMR: Multi-Receptive Lightweight Residual Modules for Hyperspectral Classification.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply