Clinical Explainability Failure (CEF) & Explainability Failure Ratio (EFR) - Changing the Way We Validate Classification Algorithms.

Abstract

Adoption of Artificial Intelligence (AI) algorithms into the clinical realm will depend on their inherent trustworthiness, which is built not only by robust validation studies but is also deeply linked to the explainability and interpretability of the algorithms. Most validation studies for medical imaging AI report the performance of algorithms on study-level labels and lay little emphasis on measuring the accuracy of explanations generated by these algorithms in the form of heat maps or bounding boxes, especially in true positive cases. We propose a new metric – Explainability Failure Ratio (EFR) – derived from Clinical Explainability Failure (CEF) to address this gap in AI evaluation. We define an Explainability Failure as a case where the classification generated by an AI algorithm matches with study-level ground truth but the explanation output generated by the algorithm is inadequate to explain the algorithm’s output. We measured EFR for two algorithms that automatically detect consolidation on chest X-rays to determine the applicability of the metric and observed a lower EFR for the model that had lower sensitivity for identifying consolidation on chest X-rays, implying that the trustworthiness of a model should be determined not only by routine statistical metrics but also by novel ‘clinically-oriented’ models.© 2022. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

Show Full Text

Clinical Explainability Failure (CEF) & Explainability Failure Ratio (EFR) – Changing the Way We Validate Classification Algorithms.

Researchers

Journal

Modalities

Models

Abstract

A Perlin Noise-Based Augmentation Strategy for Deep Learning with Small Data Samples of HRCT Images.

Novel Deep Learning Segmentation Models for Accurate GTV and OAR Segmentation in MR-Guided Adaptive Radiotherapy for Pancreatic Cancer Patients.

Associating Peritoneal Metastasis With T2-Weighted MRI Images in Epithelial Ovarian Cancer Using Deep Learning and Radiomics: A Multicenter Study.

Learning Inductive Attention Guidance for Partially Supervised Pancreatic Ductal Adenocarcinoma Prediction.

Automated Measurements of Interscrew Angles in Vertebral Body Tethering Patients with Deep Learning.

Improving electron micrograph signal-to-noise with an atrous convolutional encoder-decoder.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply