ChatGPT Again Shows Potential for Guiding Revascularization Decisions
The latest version of the LLM had high concordance with a multidisciplinary heart team in choosing between PCI and CABG.
More evidence is hinting at the potential for large language models (LLMs) like ChatGPT (OpenAI) to assist with heart team decisions in patients slated for coronary revascularization.
The latest findings, from a retrospective analysis published last week in Circulation: Cardiovascular Interventions, show good agreement between ChatGPT versions 3.5 and 4.0 for PCI versus CABG decisions as compared with those made by a multidisciplinary heart team at a single institution. The results are in line with a similar study that used simulated cases published by JACC: Cardiovascular Interventions in August.
“This study opens up a new era,” senior author Israel Moshe Barbash, MD (Sheba Medical Center, Ramat Gan, Israel), told TCTMD. “We see it not only in cardiology, but in other areas as well, that the LLM can . . . facilitate the decision-making process.”
“We are in one of the most exciting times in medicine since the invention of antibiotics,” Edward Itelman, MD (Rabin Medical Center, Petah-Tikva, Israel), who led the study published in August, told TCTMD. “We learn, test, and develop the future of medicine, and I am very excited about what comes next.”
High Accuracy
For the analysis, Karin Sudri, MA (Sheba Medical Center), Barbash, and colleagues focused on 86 consecutive coronary angiography cases (median age 63 years; 75.6% male) who were referred for revascularization between March and July 2023, feeding the data into both ChatGPT v3.5 and v4. All patients were eligible for either PCI or CABG. The researchers gave the LLMs information on patient demographics, medical background, a detailed description of angiographic findings, and SYNTAX score.
They found high concordance with decisions made by a multidisciplinary heart team with ChatGPT v4 (accuracy 0.82, sensitivity 0.8, specificity 0.83, and kappa 0.59) with high reliability and repeatability, while ChatGPT v3.5 performed slightly less well (accuracy 0.67, sensitivity 0.27, specificity 0.84, and kappa 0.12).
Cases presented with the most detail to ChatGPT v4 were associated with the most agreement with the heart team. It performed especially well for patients with left main disease, three-vessel disease, and diabetes.
“The results surprised us a lot because we didn't anticipate that the general LLM platform would be able to achieve high accuracy as compared to clinicians who do this on a daily basis,” Barbash said, adding that they included cases most representative of real-life practice, “because we wanted to make sure that the LLM actually is able to delineate for all subsets of patients in this matter.”
Itelman said this was one of the biggest strengths of the study. “They did this in real patients while we used simulated cases,” he said. “This is an important next step that they took.”
Limitations and Next Steps
Barbash noted that a major limitation of using this technology as a decision-making aide is that it can only use the information provided. “ChatGPT or any other LLM doesn't actually see the patient,” he said. “When you have a patient in front of you, in many cases, they're borderline, and you have to trust your experience . . . and the LLM does not have this ability.”
Rather, Barbash predicted that ChatGPT could be “an add-on tool to help you guide how to move forward with a situation,” especially for clinicians operating in centers without surgical care.
Going forward, he’d like to see outcomes studies comparing what happens when an LLM like ChatGPT is used to make decisions compared with physicians. “This would be key in implementing such technologies in clinical practice,” Barbash concluded.
Itelman, too, said “future research should involve double-blinded physician versus machine trials in non-life-threatening decision-making testing real-life patient outcomes. This is the natural next step.”
Yael L. Maxwell is Senior Medical Journalist for TCTMD and Section Editor of TCTMD's Fellows Forum. She served as the inaugural…
Read Full BioSources
Sudri K, Motro-Feingold I, Ramon-Gonen R, et al. Enhancing coronary revascularization decisions: the promising role of large language models as a decision-support tool for multidisciplinary heart team._Circ Cardiovacs Interv._ 2024;17:e014201.
Disclosures
- Sudri, Barbash, and Itelman report no relevant conflicts of interest.
Comments