Revisiting the Trustworthiness of Saliency Methods in Radiology AI.

Abstract

Purpose To determine whether saliency maps in radiology artificial intelligence (AI) are vulnerable to subtle perturbations of the input, which could lead to misleading interpretations, using prediction-saliency correlation (PSC) for evaluating the sensitivity and robustness of saliency methods. Materials and Methods In this retrospective study, locally trained deep learning models and a research prototype provided by a commercial vendor were systematically evaluated on 191 229 chest radiographs from the CheXpert dataset and 7022 MR images from a human brain tumor classification dataset. Two radiologists performed a reader study on 270 chest radiograph pairs. A model-agnostic approach for computing the PSC coefficient was used to evaluate the sensitivity and robustness of seven commonly used saliency methods. Results The saliency methods had low sensitivity (maximum PSC, 0.25; 95% CI: 0.12, 0.38) and weak robustness (maximum PSC, 0.12; 95% CI: 0.0, 0.25) on the CheXpert dataset, as demonstrated by leveraging locally trained model parameters. Further evaluation showed that the saliency maps generated from a commercial prototype could be irrelevant to the model output, without knowledge of the model specifics (area under the receiver operating characteristic curve decreased by 8.6% without affecting the saliency map). The human observer studies confirmed that it is difficult for experts to identify the perturbed images; the experts had less than 44.8% correctness. Conclusion Popular saliency methods scored low PSC values on the two datasets of perturbed chest radiographs, indicating weak sensitivity and robustness. The proposed PSC metric provides a valuable quantification tool for validating the trustworthiness of medical AI explainability. Keywords: Saliency Maps, AI Trustworthiness, Dynamic Consistency, Sensitivity, Robustness Supplemental material is available for this article. © RSNA, 2023 See also the commentary by Yanagawa and Sato in this issue.

Show Full Text

Revisiting the Trustworthiness of Saliency Methods in Radiology AI.

Researchers

Journal

Modalities

Models

Abstract

Identification of kidney stones in KUB X-ray images using VGG16 empowered with explainable artificial intelligence.

Categorized contrast enhanced mammography dataset for diagnostic and artificial intelligence research.

Artificial Intelligence-Assisted Bone Age Assessment to Improve the Accuracy and Consistency of Physicians With Different Levels of Experience.

A Novel Computer-Aided Detection/Diagnosis System for Detection and Classification of Polyps in Colonoscopy.

A Transfer Learning-Based Active Learning Framework for Brain Tumor Classification.

Automatic measurement of vascular calcifications in patients with aorto-iliac occlusive disease to predict the risk of re-intervention after endovascular repair.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply