|

Prediction of breast cancer distant recurrence using natural language processing and knowledge-guided convolutional neural network.

Researchers

Journal

Modalities

Models

Abstract

Distant recurrence of breast cancer results in high lifetime risks and low 5-year survival rates. Early prediction of distant recurrent breast cancer could facilitate intervention and improve patients’ life quality. In this study, we designed an EHR-based predictive model to estimate the distant recurrent probability of breast cancer patients. We studied the pathology reports and progress notes of 6,447 patients who were diagnosed with breast cancer at Northwestern Memorial Hospital between 2001 and 2015. Clinical notes were mapped to Concept unified identifiers (CUI) using natural language processing tools. Bag-of-words and pre-trained embedding were employed to vectorize words and CUI sequences. These features integrated with clinical features from structured data were downstreamed to conventional machine learning classifiers and Knowledge-guided Convolutional Neural Network (K-CNN). The best configuration of our model yielded an AUC of 0.888 and an F1-score of 0.5. Our work provides an automated method to predict breast cancer distant recurrence using natural language processing and deep learning approaches. We expect that through advanced feature engineering, better predictive performance could be achieved.
Copyright © 2020 Elsevier B.V. All rights reserved.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *