AI in Healthcare to Save Money Requires A lot of Expensive People – The Mercury News

By Darius Tahir, KFF Health News

Preparing cancer patients to make difficult decisions is the job of an oncologist. However, they don't all the time remember it. At the University of Pennsylvania Health System, doctors are encouraged to debate a patient's treatment and end-of-life preferences through an artificially intelligent algorithm that predicts the likelihood of death.

There were probably real life implications. Ravi Parikh, an Emory University oncologist and lead creator of the study, told KFF Health News that the tool has failed tons of of times to get doctors to have this essential discussion – potentially avoiding unnecessary chemotherapy – with patients who needed it. to initiate.

He believes several algorithms designed to enhance medical care have been weakened throughout the pandemic, not only Penn Medicine's. “Many institutions do not routinely monitor the performance of their products,” Parikh said.

Algorithm glitches are one facet of a dilemma that computer scientists and doctors have long recognized but that’s increasingly puzzling hospital managers and researchers: Artificial intelligence systems require consistent monitoring and staffing to set them up and ensure they function properly.

Essentially: you would like people and more machines to be sure the brand new tools don't break.

“Everyone thinks AI is going to help us with our access and our capacity and improve care, etc.,” said Nigam Shah, chief data scientist at Stanford Health Care. “That’s all well and good, but if it increases the cost of care by 20% “Is that possible then?”

Government officials worry that hospitals don't have the resources to place these technologies through their paces. “I have looked far and wide,” FDA Commissioner Robert Califf said recently at an agency panel on AI. “I don’t believe there is a single healthcare system in the United States that has the ability to validate an AI algorithm used in a clinical care system.”

AI is already widely utilized in healthcare. Algorithms are used to predict patients' risk of death or deterioration, suggest diagnoses or triage patients, record and summarize visits to save lots of doctors work, and approve insurance claims.

If technology evangelists are right, technology will turn out to be ubiquitous – and profitable. Investment firm Bessemer Venture Partners has identified about 20 health-focused AI startups which are on course to generate $10 million each in revenue in a 12 months. The FDA has approved nearly a thousand artificially intelligent products.

Assessing whether these products work is difficult. Even harder is assessing whether or not they proceed to operate – or have developed the software equivalent of a blown gasket or leaky motor.

Take a recent Yale Medicine study evaluating six “early warning systems” that alert doctors when patients’ conditions are more likely to deteriorate quickly. A supercomputer processed the information for several days, said Dana Edelson, a physician on the University of Chicago and co-founder of an organization that provided an algorithm for the study. The process was successful and showed large performance differences between the six products.

It isn’t easy for hospitals and providers to decide on the very best algorithms for his or her needs. The average doctor doesn't have a supercomputer lying around and there are not any Consumer Reports for AI.

“We don’t have standards,” said Jesse Ehrenfeld, former president of the American Medical Association. “There is nothing I can show you today that sets a standard for how you evaluate, monitor and look at the performance of a model of an algorithm, whether AI capable or not, when it is deployed.”

Perhaps probably the most widely used AI product in medical practices is named Ambient Documentation, a technology-enabled assistant that listens to and summarizes patient visits. Last 12 months, Rock Health investors reported $353 million poured into these documentation firms. However, Ehrenfeld said, “There is currently no standard for comparing the results of these tools.”

And that's an issue because even small mistakes can have devastating consequences. A team at Stanford University tried to summarize patients' medical histories using large language models, the technology that powers popular AI tools like ChatGPT. They compared the outcomes to what a physician would write.

“Even in the best case scenario, the models had a 35% error rate,” said Stanford’s Shah. In medicine: “If you write a summary and forget a word, like 'fever' – that's a problem, right?”

Sometimes the the explanation why algorithms fail are quite logical. For example, changes to the underlying data can impact their effectiveness, comparable to when hospitals change laboratory providers.

Sometimes, nevertheless, the pitfalls lurk for no apparent reason.

Sandy Aronson, a technical lead for Mass General Brigham's personalized medicine program in Boston, said that when his team tested an application designed to assist genetic counselors find relevant literature about DNA variants, the product suffered from “non-determinism” – meaning if you asked the identical thing. I asked the query several times inside a brief time frame and got different results.

Aronson is worked up concerning the potential of enormous language models to aggregate knowledge for overburdened genetic counselors, but “the technology needs to improve.”

What should institutions do when metrics and standards are sparse and errors can occur for strange reasons? Invest a number of resources. At Stanford, Shah says, it took eight to 10 months and 115 hours of labor simply to test two models for fairness and reliability.

Experts interviewed by KFF Health News expressed the concept of ​​artificial intelligence monitoring artificial intelligence, with some (human) data experts monitoring each. All acknowledged that this is able to require organizations to spend even extra money – a difficult task given the fact of hospital budgets and the limited supply of AI technology specialists.

“It's great to have a vision where we're melting icebergs for a model to monitor their model,” Shah said. “But is that really what I wanted? How many more people will we need?”

©2025 KFF Health News. Distributed by Tribune Content Agency, LLC.

image credit : www.mercurynews.com