Learning protein subcellular localization multi-view patterns from heterogeneous data of imaging, sequence and networks.

Abstract

Location proteomics seeks to provide automated high-resolution descriptions of protein location patterns within cells. Many efforts have been undertaken in location proteomics over the past decades, thereby producing plenty of automated predictors for protein subcellular localization. However, most of these predictors are trained solely from high-throughput microscopic images or protein amino acid sequences alone. Unifying heterogeneous protein data sources has yet to be exploited. In this paper, we present a pipeline called sequence, image, network-based protein subcellular locator (SIN-Locator) that constructs a multi-view description of proteins by integrating multiple data types including images of protein expression in cells or tissues, amino acid sequences and protein-protein interaction networks, to classify the patterns of protein subcellular locations. Proteins were encoded by both handcrafted features and deep learning features, and multiple combining methods were implemented. Our experimental results indicated that optimal integrations can considerately enhance the classification accuracy, and the utility of SIN-Locator has been demonstrated through applying to new released proteins in the human protein atlas. Furthermore, we also investigate the contribution of different data sources and influence of partial absence of data. This work is anticipated to provide clues for reconciliation and combination of multi-source data for protein location analysis.© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected].

Show Full Text

Learning protein subcellular localization multi-view patterns from heterogeneous data of imaging, sequence and networks.

Researchers

Journal

Modalities

Models

Abstract

Full-Spectrum Prediction of Peptides Tandem Mass Spectra using Deep Neural Network.

Deep learning approach to identifying cancer subtypes using high-dimensional genomic data.

EpiTEAmDNA: Sequence feature representation via transfer learning and ensemble learning for identifying multiple DNA epigenetic modification types across species.

Incorporating label correlations into deep neural networks to classify protein subcellular location patterns in immunohistochemistry images.

Weighted average ensemble-based semantic segmentation in biological electron microscopy images.

A new LSTM-based gene expression prediction model: L-GEPM.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply