| |

Prediction of KRAS inhibitors using conjoint fingerprint and machine learning-based QSAR models.

Researchers

Journal

Modalities

Models

Abstract

Kirsten rat sarcoma virus G12C (KRASG12C) is the major protein mutation associated with non-small cell lung cancer (NSCLC) severity. Inhibiting KRASG12C is therefore one of the key therapeutic strategies for NSCLC patients. In this paper, a cost-effective data driven drug design employing machine learning-based quantitative structure-activity relationship (QSAR) analysis was built for predicting ligand affinities against KRASG12C protein. A curated and non-redundant dataset of 1033 compounds with KRASG12C inhibitory activity (pIC50) was used to build and test the models. The PubChem fingerprint, Substructure fingerprint, Substructure fingerprint count, and the conjoint fingerprint-a combination of PubChem fingerprint and Substructure fingerprint count-were used to train the models. Using comprehensive validation methods and various machine learning algorithms, the results clearly showed that the XGBoost regression (XGBoost) achieved the highest performance in term of goodness of fit, predictivity, generalizability and model robustness (R2 = 0.81, Q2CV = 0.60, Q2Ext = 0.62, R2 – Q2Ext = 0.19, R2Y-Random = 0.31 ± 0.03, Q2Y-Random = -0.09 ± 0.04). The top 13 molecular fingerprints that correlated with the predicted pIC50 values were SubFPC274 (aromatic atoms), SubFPC307 (number of chiral-centers), PubChemFP37 (≥1 Chlorine), SubFPC18 (Number of alkylarylethers), SubFPC1 (number of primary carbons), SubFPC300 (number of 1,3-tautomerizables), PubChemFP621 (N-C:C:C:N structure), PubChemFP23 (≥1 Fluorine), SubFPC2 (number of secondary carbons), SubFPC295 (number of C-ONS bonds), PubChemFP199 (≥4 6-membered rings), PubChemFP180 (≥1 nitrogen-containing 6-membered ring), and SubFPC180 (number of tertiary amine). These molecular fingerprints were virtualized and validated using molecular docking experiments. In conclusion, this conjoint fingerprint and XGBoost-QSAR model demonstrated to be useful as a high-throughput screening tool for KRASG12C inhibitor identification and drug design.Copyright © 2023 Elsevier Inc. All rights reserved.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *