ChatGPT-Generated Echo Reports Help Explain Findings to Patients

Using AI in this fashion could not only save time, but also alleviate the stresses brought on by encountering medical jargon.

ChatGPT-Generated Echo Reports Help Explain Findings to Patients

Using artificial intelligence (AI) to produce patient-friendly echocardiogram reports might help save time and alleviate confusion around the findings before patients have had a chance to speak with their clinician, according to new data.

Ever since the 21st Century Cures Act was implemented in the United States in 2021, patients have been entitled to access their medical test findings as soon as they are available, oftentimes before their physicians are able to review the results.

“Imaging reports, particularly [things] like echos, MRIs, procedure reports, tend to be very technical, very rich in jargon, and very rich in information that's not intuitive to a layperson,” senior author Lior Jankelson, MD, PhD (NYU Grossman School of Medicine, New York, NY), told TCTMD. He cited examples where anxious patients have called him, rattled by language in an imaging report. In these instances, said Jankelson, “you take the time to reassure them that this is just the language we use and this is a trivial finding that you can see in many people, and it's normal, and everything's going to be okay.”

But all of this back-and-forth takes time that many physicians don’t have to spare, and sometimes patients are left on edge.

Using a large language model like ChatGPT to review images and generate an interim report could “provide some sort of an intermediate layer where [patients] see this before they converse with the provider, but at least they have some good understanding of what's going on,” Jankelson explained, adding that a physician would still likely need to review the report before it’s released.

Shaan Khurshid, MD, MPH (Massachusetts General Hospital, Boston, MA), who was not involved in the study, agreed that using language learning models in this way could save time.

“It's true that patients get results right away, which is a good thing,” he said. “But frequently it's very technical and it takes time for the physician to get those results, process them, summarize them in their own way and deliver those results to the patient. That is a place where chatbots can be helpful in summarizing the data and giving the patient a more immediate summary so that they're not waiting for the result and the physician. Potentially, that's less work for the physician to do.”

Khurshid said he frequently hears from confused patients calling the office for an explanation of their results. “They don't fully understand whether it's a bad result or a good result,” he said. “And it can be frustrating for them when they, even after sending a message, don't get an immediate response.”

Agreement and Accuracy Results

For the study, published as a research letter online this week in JACC: Cardiovascular Imaging, Jacob A. Martin, MD (NYU Grossman School of Medicine), Jankelson, and colleagues used a HIPAA-compliant ChatGPT (OpenAI) model to review and rewrite reports for 100 transthoracic echocardiograms taken at their institution. Median patient age was 66 years, and 23% presented with LV systolic dysfunction.

The AI-generated rewrites were generally shorter than full echocardiogram reports (median 1,216 vs 2,150 characters). Five cardiologists reviewed them as to whether they could be accepted without edits, strongly agreeing for 29%, agreeing for 44%, remaining neutral for 13%, and disagreeing for 14%. They also judged the rewrites for accuracy, marking 84% as “all true” and the remaining 16% as “mostly correct.”

Of the 16 rewrites with incorrect statements, eight were labeled as “potentially dangerous,” four needed correction but didn’t contain dangerous information, and four contained errors but were unlikely to need correction.

As for the relevance of each rewrite, the cardiologists said 76% contained “all of the important information,” while 15% contained “most” of it, 7% had “about half,” and 2% “less than half.” While none of these rewrites with missing information were rated as “potentially dangerous” by the cardiologists, nine required correction but posed no danger, one had an “indeterminate need for correction,” six were deemed as having an unlikely need for correction, and 12 were “insignificant.”

Overall, asked if the rewrites appropriately represented quantitative information, cardiologists strongly agreed for 54%, agreed for 36%, were neutral about 2%, and disagreed for 1%.

Lastly, the researchers had 12 nonclinical individuals evaluate the rewrites for understandability. Compared with the original report, these reviewers said 70% of the AI rewrites were “much more” understandable, 27% were “a little more,” and 3% were “equally” understandable. While they said that 15% and 35% of the rewrites would “strongly” and “slightly” reduce their worry compared with the original reports, half were either neutral or felt more anxiety with patient-oriented report. The vast majority (85%) of nonclinical reviewers said they would strongly prefer to have the AI-generated rewrites in addition to original reports.

Notably, inter-rater reliability was low among echocardiographers in this analysis and only fair among the nonclinical reviewers.

Acknowledging that this application of technology is still early, Jankelson quelled worry over its potential to encroach on physician jobs. If anything, he said, “doctors that will use AI will just replace those who don't use AI. It's not that the AI is going to replace the doctors.”

The ongoing concerns over hallucinations in tools like ChatGPT as well as privacy are valid, Jankelson said, but these will also likely be addressed with time and fixes like oversight and local systems.

His team plans to continue to test this model on more patients and “extend this platform to include other imaging and other text communications, procedure reports, and surgical reports.”

Patient Validation Needed

To TCTMD, Rohan Khera, MBBS (Yale School of Medicine, New Haven, CT), said in an email that the potential for ChatGPT to be used in this fashion is not surprising as the tools are becoming “very good at summarizing and paraphrasing technical texts. It is good to see that it maintains technical accuracy in this process.”

However, “I believe it is a proof-of-concept study, [and] certainly needs more real-world evaluation by patients,” he added.

Khurshid, too, said more validation is needed. “The next step for a study like this would be to actually implement it, which I think they've justified here, in a study and see [if] patients actually find this satisfying,” he said.

Additionally, Khurshid said that because about one-third of the AI rewrites required some editing, that poses a barrier to this becoming a routine clinical tool. “For this to really work, I don't think that it would be really that useful for me to have to review every single AI-generated result or summary,” he said.

The question of cost also remains unanswered. “ChatGPT is owned by OpenAI,” he said. “Are they going to pay for every application of this if you're going to scale it to millions of patients?” Khurshid asked. “Who's going to pay for that? And is there even the computational power or support to make it happen?”

The low inter-rater reliability seen in the study points to the fact that not all physicians style their ECG reports in the same way. That’s important, he said, because it might be beneficial to one day have a way “to incorporate one's practice patterns or preferences in their own AI-generated results.”

Privacy concerns will likely be addressed by individual institutions storing the data much like they would other EHR data, Khurshid said, but transparency will be necessary with patients being informed they are receiving AI-generated reports. It’s possible, too, that the AI-generated reports might even create a barrier because they feel the doctor doesn’t have enough time for them, he said.

Sources
Disclosures
  • Martin, Jankelson, Khera, and Khurshid report no relevant conflicts of interest.

Comments