Extracting chemical-protein interactions from literature using sentence structure analysis and feature engineering.

February 8, 2019 Bioinformatics, Biomedical Informatics

Researchers

Disa Yu Jinfeng Zhang Pei-Yau Lung Tingting Zhao Zhe He

Journal

Database : the journal of biological databases and curation

Modalities

Models

machine learning

Abstract

Information about the interactions between chemical compounds and proteins is indispensable for understanding the regulation of biological processes and the development of therapeutic drugs. Manually extracting such information from biomedical literature is very time and resource consuming. In this study, we propose a computational method to automatically extract chemical-protein interactions (CPIs) from a given text. Our method extracts CPI pairs and CPI triplets from sentences, where a CPI pair consists of a chemical compound and a protein name, and a CPI triplet consists of a CPI pair along with an interaction word describing their relationship. We extracted a diverse set of features from sentences that were used to build multiple machine learning models. Our models contain both simple features, which can be directly computed from sentences, and more sophisticated features derived using sentence structure analysis techniques. For example, one set of features was extracted based on the shortest paths between the CPI pairs or among the CPI triplets in the dependency graphs obtained from sentence parsing. We designed a three-stage approach to predict the multiple categories of CPIs. Our method performed the best among systems that use non-deep learning methods and outperformed several deep-learning-based systems in the track 5 of the BioCreative VI challenge. The features we designed in this study are informative and can be applied to other machine learning methods including deep learning.

Show Full Text

Extracting chemical-protein interactions from literature using sentence structure analysis and feature engineering.

Researchers

Journal

Modalities

Models

Abstract

GraphSynergy: a network-inspired deep learning model for anticancer drug combination prediction.

Substituting clinical features using synthetic medical phrases: Medical text data augmentation techniques.

BioDeepfuse: a hybrid deep learning approach with integrated feature extraction techniques for enhanced non-coding RNA classification.

Metal3D: a general deep learning framework for accurate metal ion location prediction in proteins.

iDeLUCS: A deep learning interactive tool for alignment-free clustering of DNA sequences.

A topology-preserving dimensionality reduction method for single-cell RNA-seq data using graph autoencoder.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply