His concerned owners passed his symptoms and test results to GPT-4, the much better new version of the hugely popular ChatGPT, and this suggested a type of autoimmune anemia. They told a second vet, who tested Sassy, confirmed the diagnosis, and treated her. She is now almost fully recovered.
“In my opinion, the moral of the story is not ‘AI will replace doctors’, but ‘AI could be a useful tool in the hands of doctors,'” says its owner, who asked be identified only as Cooper. (He provided copies of his lab results to show the unexpected viral story was true.)
This vision of new AI as a potentially revolutionary tool in healthcare is spreading rapidly. Last week, the august New England Journal of Medicine launched an “AI in Medicine” series and announced that it will publish a brand new journal, NEJM AI, next year.
“To my colleagues in the medical establishment, let’s not leave it to others to guide the implementation of this groundbreaking technology,” says the new journal’s editor-in-chief, Dr. Biomedical Informatics from Harvard Medical School. “Let’s make sure it’s safe and helps our patients.”
I’ve spent the past few months working on a book on AI and medicine with Kohane and Peter Lee, who lead research at Microsoft, which partners with GPT-4 creator OpenAI. I was embedded with them and other researchers when they had early access to GPT-4, one of many new “great language models”, including Google’s Bard. My main question: what can patients and healthcare providers expect from the new AI?
After all, health care is emerging as a sphere where the potential benefits could be among the greatest, but also where the risks could be high, especially given the lies that chatbots sometimes convincingly peddle as fact. And medicine has been burned by AI hype in the past, notably when IBM’s Watson failed to deliver on promises to transform cancer care. Approved uses for the technology – analyzing scans, predicting seizures – have remained relatively narrow.
But even with these caveats in mind, it is hard to avoid the conclusion that This time with these models, we are on the eve of a major change in the field of AI-based medicine.
The team I followed achieved amazing results.
The model is surprisingly good at helping identify optimal diagnoses and treatments — better than many doctors, according to Kohane, who combines a doctorate in computer science with a doctor who specializes in pediatric endocrinology. In one experiment, GPT-4 correctly diagnosed a previous case involving a disease that only affects one in 100,000 babies. Kohane had also correctly identified this case at the time, but it required many additional steps.
Kohane was initially so stunned by these abilities that he felt like a sci-fi character who had just encountered a seemingly benevolent alien. He says he “couldn’t decide whether to give him the keys to our planet or seal him in a bunker until we found out.”
GPT-4 can deftly distill a 5,000-word medical study into a few concise sentences, reports Kohane. Ultimately, this could speed up clinical research by suggesting potential new treatments and identifying eligible test subjects through their medical records. GPT-4 could also leverage medical records to help determine the best treatment for each patient by looking at outcomes for similar patients.
For healthcare workers burdened with Sisyphean medical paperwork, the GPT-4 could take on much of this largely despised drudgery, Lee predicts. For example, he says, GPT-4 can write pre-authorization requests to get insurers to cover necessary treatments. (I think I hear a chorus of long-suffering psychiatrists shouting “We’ll take it!”) It can also automatically summarize what happens in doctor-patient encounters, in notes for the medical record.
This is no small thing in this time of burnout and understaffing. Studies show that cumbersome bureaucracy has contributed to alienation and attrition, so much so that the US Surgeon General last year called for a 75% reduction in the “burden of documentation”.
One of the biggest surprises from early experiments with GPT-4 was its ability to mimic good bedside manner.
During one interaction, he answered a question from the medical board, diagnosing a 12-year-old girl who had swollen legs and blood in her urine as having post-streptococcal syndrome. Next, the AI was asked what the doctor should tell it. His response included:
“Sarah, it sounds like you have a condition called acute post-streptococcal glomerulonephritis. This is a condition that can occur after a throat infection caused by a specific type of bacteria. This condition has caused your kidneys to become inflamed, which is why you have swollen legs and blood in your urine. You didn’t do anything wrong and it’s not contagious. We will take care of you and help you get better.
GPT-4 has shown other benefits for patients. This may actually explain the “explanation of benefits” forms from those insurance companies that none of us really understand. And it can help people buy care by comparing results from various providers.
Several researchers believe that the new AI could also increase fairness. For example, it could generate post-care instructions for patients at the right literacy level and in the right languages, says Jorge Rodriguez, a Harvard physician-scientist who practices at Brigham and Women’s Hospital and studies health equity. digital.
Ideally, he says, as uses of chatbots grow, a guiding principle would be, “Who needs healthcare help the most?” And “This time we’re going to prioritize marginalized communities.”
Do no harm
Of course, chatbots are also highly fallible and capable of going off the rails. They make things up and get things wrong – not trends we’d like to see in the tools our medical providers use. As Peter Lee says, GPT-4 “is both smarter and dumber than anyone you’ve ever met.”
In one instance he documented, the transcript of a visit from a patient with anorexia did not include a weigh-in, so GPT-4 simply dialed in a weight for her. In another, the basic math got it wrong. Curiously, he completed the Sudoku puzzles incorrectly, then, even more bizarrely, attributed the mistakes to “typo”. He is known for inventing imaginary research articles in fictional journals. A recent Stanford study found that when asked for “bedside consultation” – information needed during clinical care – GPT-4 responses could be considered safe for patients 93% of the time. The rest included “hallucinated quotes”.
Cautionary anecdotes abound. An ER doctor, Joshua Tamayo-Sarver, reported in Fast Company that when he tested it, ChatGPT missed several diagnoses among his recent patients, including an ectopic pregnancy that could have proved fatal. “ChatGPT worked pretty well as a diagnostic tool when I gave it perfect information and the patient had a classic presentation,” he wrote, but a key part of medicine is knowing what to ask.
At this time, GPT-4 is too new for any healthcare facility to have adopted, and unless and until its medical accuracy can be systematically tested and proven, there must always be a human in the loop.
For patients who decide to use chatbots independently, this means it’s essential to always, always check any medical advice from GPT-4 or Google’s Bard or other new AI entities.
But will this always be the case? I am not sure. Technology is advancing at breakneck speed. Already, Lee points out, researchers have used one GPT-4 system to verify the work of another. The system’s responses are highly context-dependent, he says, and the context is different when asked to verify than to generate a response.
Given the potential benefits, it seems reasonable to expect that chatbots, with time, caution and regulation, will be incorporated into healthcare – and improve it, especially for those who currently do not have access to decent care.
Chatbots in medicine “can help us do our jobs better,” write Jeffrey Drazen and Charlotte Haug of the New England Journal, “but if not used correctly, they can cause harm.” Ultimately, “the medical community will learn to use them, but we have to learn.”
My primary care doctor recently sent me a kind reminder that I was overdue for a mammogram and – in the depths of book pangs – I replied that this might be the last time she would have to write this e-mail herself.
She was skeptical. We will see.
Carey Goldberg is a longtime health and science journalist, including for The Globe and WBUR, and has also served as Boston bureau chief for The New York Times and Bloomberg News. His next book with Peter Lee and Isaac Kohane is “The AI revolution in medicine: GPT-4 and beyond.”