Addressing label noise for electronic health records: insights from computer vision for tabular data.

Researchers

Andrew A S Soltan David A Clifton Hagen Triendl Jenny Yang Mangal Prakash

Journal

BMC medical informatics and decision making

Modalities

Models

Abstract

The analysis of extensive electronic health records (EHR) datasets often calls for automated solutions, with machine learning (ML) techniques, including deep learning (DL), taking a lead role. One common task involves categorizing EHR data into predefined groups. However, the vulnerability of EHRs to noise and errors stemming from data collection processes, as well as potential human labeling errors, poses a significant risk. This risk is particularly prominent during the training of DL models, where the possibility of overfitting to noisy labels can have serious repercussions in healthcare. Despite the well-documented existence of label noise in EHR data, few studies have tackled this challenge within the EHR domain. Our work addresses this gap by adapting computer vision (CV) algorithms to mitigate the impact of label noise in DL models trained on EHR data. Notably, it remains uncertain whether CV methods, when applied to the EHR domain, will prove effective, given the substantial divergence between the two domains. We present empirical evidence demonstrating that these methods, whether used individually or in combination, can substantially enhance model performance when applied to EHR data, especially in the presence of noisy/incorrect labels. We validate our methods and underscore their practical utility in real-world EHR data, specifically in the context of COVID-19 diagnosis. Our study highlights the effectiveness of CV methods in the EHR domain, making a valuable contribution to the advancement of healthcare analytics and research.© 2024. The Author(s).

Show Full Text

Addressing label noise for electronic health records: insights from computer vision for tabular data.

Researchers

Journal

Modalities

Models

Abstract

Digital imaging-in-flow (FlowCAM) and probabilistic machine learning to assess the sonolytic disinfection of cyanobacteria in sewage wastewater.

Effective Methods Based on Distinct Learning Principles for the Analysis of Hyperspectral Images to Detect Black Sigatoka Disease.

Domain adaptation for EEG-based, cross-subject epileptic seizure prediction.

Transfer learning for the efficient detection of COVID-19 from smartphone audio data.

Applications of machine learning in metabolomics: Disease modeling and classification.

Using weak supervision and deep learning to classify clinical notes for identification of current suicidal ideation.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply