|

O-glycosylation site prediction for by combining properties and sequence features with support vector machine.

Researchers

Journal

Modalities

Models

Abstract

O-glycosylation is a protein posttranslational modification important in regulating almost all cells. It is related to a large number of physiological and pathological phenomena. Recognizing O-glycosylation sites is the key to further investigating the molecular mechanism of protein posttranslational modification. This study aimed to collect a reliable dataset on Homo sapiens and develop an O-glycosylation predictor for Homo sapiens, named Captor, through multiple features. A random undersampling method and a synthetic minority oversampling technique were employed to deal with imbalanced data. In addition, the Kruskal-Wallis (K-W) test was adopted to optimize feature vectors and improve the performance of the model. A support vector machine, due to its optimal performance, was used to train and optimize the final prediction model after a comprehensive comparison of various classifiers in traditional machine learning methods and deep learning. On the independent test set, Captor outperformed the existing O-glycosylation tool, suggesting that Captor could provide more instructive guidance for further experimental research on O-glycosylation. The source code and datasets are available at https://github.com/YanZhu06/Captor/.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *