|

Assessing and Mitigating Bias in Medical Artificial Intelligence: The Effects of Race and Ethnicity on a Deep Learning Model for ECG Analysis.

Researchers

Journal

Modalities

Models

Abstract

Background – Deep learning algorithms derived in homogeneous populations may be poorly generalizable and have the potential to reflect, perpetuate, and even exacerbate racial/ethnic disparities in health and healthcare. In this study we aimed to (1) assess if the performance of a deep learning algorithm designed to detect low left ventricular ejection fraction (LVEF) using the 12-lead electrocardiogram (ECG) varies by race/ethnicity, and to (2) determine whether its performance is determined by the derivation population, or by racial variation in the ECG. Methods – We performed a retrospective cohort analysis that included 97,829 patients with paired ECGs and echocardiograms. We tested the model performance by race/ethnicity for convolutional neural network (CNN) designed to identify patients with a LVEF ≤35% from the 12-lead ECG. Results – The CNN which was previously derived in a homogeneous population (derivation cohort N=44,959; 96.2% non-Hispanic White) demonstrated consistent performance to detect low LVEF across a range of racial/ethnic subgroups in a separate testing cohort (N=52,870): Non-Hispanic white (N= 44,524, AUC 0.931), Asian (N=557, AUC 0.961), black/African American (N=651, AUC 0.937), Hispanic/Latino (N=331, AUC 0.937), and American Indian/Native Alaskan (N=223, AUC 0.938). In secondary analyses, a separate neural network was able to discern racial subgroup category (Black/African-American [AUC of 0.84], and white, non-Hispanic [AUC 0.76] in a five-class classifier), and a network trained only in non-Hispanic whites from the original derivation cohort performed similarly well across a range of racial/ethnic subgroups in the testing cohort with an AUC of at least 0.930 in all racial/ethnic subgroups. Conclusions – Our study demonstrates that while ECG characteristics vary by race, this did not impact the ability of a CNN to predict low LVEF from the ECG. We recommend reporting of performance amongst diverse ethnic, racial, age and gender groups for all new AI tools to ensure responsible use of AI in medicine.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *