Validation of a Deep Learning Chest X-ray Interpretation Model: Integrating Large-Scale AI and Large Language Models for Comparative Analysis with ChatGPT.

Researchers

Journal

Modalities

Models

Abstract

This study evaluates the diagnostic accuracy and clinical utility of two artificial intelligence (AI) techniques: Kakao Brain Artificial Neural Network for Chest X-ray Reading (KARA-CXR), an assistive technology developed using large-scale AI and large language models (LLMs), and ChatGPT, a well-known LLM. The study was conducted to validate the performance of the two technologies in chest X-ray reading and explore their potential applications in the medical imaging diagnosis domain. The study methodology consisted of randomly selecting 2000 chest X-ray images from a single institution’s patient database, and two radiologists evaluated the readings provided by KARA-CXR and ChatGPT. The study used five qualitative factors to evaluate the readings generated by each model: accuracy, false findings, location inaccuracies, count inaccuracies, and hallucinations. Statistical analysis showed that KARA-CXR achieved significantly higher diagnostic accuracy compared to ChatGPT. In the ‘Acceptable’ accuracy category, KARA-CXR was rated at 70.50% and 68.00% by two observers, while ChatGPT achieved 40.50% and 47.00%. Interobserver agreement was moderate for both systems, with KARA at 0.74 and GPT4 at 0.73. For ‘False Findings’, KARA-CXR scored 68.00% and 68.50%, while ChatGPT scored 37.00% for both observers, with high interobserver agreements of 0.96 for KARA and 0.97 for GPT4. In ‘Location Inaccuracy’ and ‘Hallucinations’, KARA-CXR outperformed ChatGPT with significant margins. KARA-CXR demonstrated a non-hallucination rate of 75%, which is significantly higher than ChatGPT’s 38%. The interobserver agreement was high for KARA (0.91) and moderate to high for GPT4 (0.85) in the hallucination category. In conclusion, this study demonstrates the potential of AI and large-scale language models in medical imaging and diagnostics. It also shows that in the chest X-ray domain, KARA-CXR has relatively higher accuracy than ChatGPT.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *