One clinician is all you need: Data-Efficient NLP Measurement Extraction from Cardiac MRI Reports.

Researchers

Journal

Modalities

Models

Abstract

Introduction Cardiac MRI (CMR) is a powerful imaging modality that provides detailed quantitative assessment of cardiac anatomy and function. Automated extraction of CMR measurements from clinical reports that are typically stored as unstructured text in electronic health record (EHR) systems would facilitate their use in research. Existing machine learning approaches either rely on large quantities of expert annotation, or require the development of engineered rules that are time-consuming and are specific to the setting in which they were developed. We hypothesize that the use of pre-trained transformer-based language models may enable label-efficient numerical extraction from clinical text without the need for heuristics or large quantities of expert annotations. Here we fine-tune pre-trained transformer-based language models on a small quantity of CMR annotations to extract 21 CMR measurements. We assessed the effect of clinical pre-training to reduce labeling needs and explored alternative representations of numerical inputs to improve performance. Methods Our study sample comprised 99,252 patients that received longitudinal cardiology care in a multi-institutional healthcare system. There were 12,720 available CMR reports from 9,280 patients. We adapted PRAnCER, an annotation tool for clinical text, to collect annotations from a study clinician on 370 reports. We experimented with five different representations of numerical quantities and several model weight initializations. We evaluated extraction performance using macro-averaged F1 scores across the measurements of interest. We applied the best performing model to extract measurements from the remaining CMR reports in the study sample, and evaluated established associations between selected extracted measures with clinical outcomes to demonstrate validity. Results All combinations of weight initializations and numerical representations obtained excellent performance on the gold-standard test set, suggesting that transformer models fine-tuned on a small set of annotations can effectively extract numerical quantities. Our results further indicate that custom numerical representations did not appear to have a significant impact on extraction performance. The best performing model achieved a macro-averaged F1 score of 0.957 across the evaluated CMR measurements (range 0.92 for lowest performing measure of left atrial anterior-posterior dimension to 1.0 for highest performing measures of left ventricular end systolic volume index and left ventricular end systolic diameter). Application of the best performing model to the study cohort yielded 136,407 measurements from all available reports in the study sample. We observed expected associations between extracted left ventricular mass index, left ventricular ejection fraction, and right ventricular ejection fraction with clinical outcomes like atrial fibrillation, heart failure, and mortality. Conclusions This study demonstrated that a domain-agnostic pre-trained transformer model is able to effectively extract quantitative clinical measurements from diagnostic reports with a relatively small number of gold-standard annotations. The proposed workflow may serve as a roadmap for other quantitative entity extraction. Keywords: natural language processing; transformer models; machine learning; cardiac MRI; clinical outcomes; deep learning.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *