| |

EMCBOW-GPCR: A method for identifying G-protein coupled receptors based on word embedding and wordbooks.

Researchers

Journal

Modalities

Models

Abstract

G Protein-Coupled Receptors (GPCRs) are one of the largest membrane protein receptor family in human, which are also important targets for many drugs. Thence, it’s of great significance to judge whether a protein is a GPCR or not. However, identifying GPCRs by experimental methods is very expensive and time-consuming. As more and more GPCR primary sequences are accumulated, it’s feasible to develop a computational model to predict GPCRs precisely and quickly. In this paper, a novel method called EMCBOW-GPCR has been proposed to improve the accuracy of identifying GPCRs based on natural language processing (NLP). For representing GPCRs, three word-embedding models and a bag-of-words model are used to extract original features. Then, the original features are thrown into a Deep-learning algorithm to extract features further and reduce the dimension. Finally, the obtained features are fed into Extreme Gradient Boosting. As shown with the results comparison, the overall prediction metrics of EMCBOW-GPCR are higher than the state of the arts. In order to be convenient for more researchers to use EMCBOW-GPCR, the method and source code have been opened in github, which are available at https://github.com/454170054/EMCBOW-GPCR, and a user-friendly web-server for EMCBOW-GPCR has been established at http://www.jci-bioinfo.cn/emcbowgpcr.© 2021 The Author(s).

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *