Preliminary Evaluation of Fine-Tuning the OpenDeLD Deidentification Pipeline Across Multi-Center Corpora.

Researchers

Jiaxing Liu Jitendra Jonnagaddala Shalini Gupta Zoie Shiu-Yee Wong

Journal

Studies in health technology and informatics

Modalities

Models

Abstract

Automatic deidentification of Electronic Health Records (EHR) is a crucial step in secondary usage for biomedical research. This study introduces evaluation of an intricate hybrid deidentification strategy to enhance patient privacy in secondary usage of EHR. Specifically, this study focuses on assessing automatic deidentification using OpenDeID pipeline across diverse corpora for safeguarding sensitive information within EHR datasets by incorporating diverse corpora. Three distinct corpora were utilized: the OpenDeID v2 corpus containing pathology reports from Australian hospitals, the 2014 i2b2/UTHealth deidentification corpus with clinical narratives from the USA, and the 2016 CEGS N-GRID identification corpus comprising psychiatric notes. The OpenDeID pipeline employs a hybrid approach based on deep learning and contextual rules. Pre-processing steps involved harmonizing and addressing encoding and format issues. Precision, Recall, F-measure metrics were used to assess the performance. The evaluation metrics demonstrated the superior performance of the Discharge Summary BioBERT model. Trained on three corpora with a total of 4,038 reports, the best performing model exhibited robust deidentification capabilities when applied to EHR. It achieved impressive micro-averaged F1-scores of 0.9248 and 0.9692 for strict and relaxed settings, respectively. These results offer valuable insights into the model’s efficacy and its potential role in safeguarding patient privacy in secondary usage of EHR.

Show Full Text

Preliminary Evaluation of Fine-Tuning the OpenDeLD Deidentification Pipeline Across Multi-Center Corpora.

Researchers

Journal

Modalities

Models

Abstract

Multiview Deep Learning-based Efficient Medical Data Management for Survival Time Forecasting.

Interrelated feature selection from health surveys using domain knowledge graph.

Machine learning approaches for electronic health records phenotyping: a methodical review.

Modeling and analyzing single-cell multimodal data with deep parametric inference.

An empirical evaluation of deep learning for ICD-9 code assignment using MIMIC-III clinical notes.

Modern Clinical Text Mining: A Guide and Review.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply