Study: Chatbots can handle medical questions better than doctors


In the battle between chatbots and human doctors, AI appears to already be winning – at least when it comes to answering patient questions. Researchers have now found chatbot responses to everyday health queries to be of a ‘significantly higher quality’ overall. — dpa

SAN DIEGO, California: What are my odds of dying after swallowing a toothpick?

Do I need to see a doctor after hitting my head on a metal bar while running?

Am I likely to go blind after getting bleach splashed in my eye?

A new study led by researchers at UC San Diego explores how artificial intelligence compares to human expertise in the workaday task of dashing off quick responses to routine medical questions.

Published Friday in the medical journal JAMA Internal Medicine, the paper finds that ChatGPT, the world-upending chatbot with a seemingly-infinite breadth of training, was able to more than hold its own when its responses were judged by a panel of experts against those made by flesh-and-blood physicians.

Evaluators found they “preferred the chatbot responses to the physician responses”, in 78% of evaluations made. What’s more, chatbot responses were found to be of a “significantly higher quality” than those from humans. And, in terms of empathy, an area where people would intuitively seem to have an edge, silicon again excelled.

“Chatbot responses were rated significantly more empathetic than physician responses,” the paper states.

Despite the lopsided results, this paper’s authors say doctors should be excited by what they show.

John W. Ayers, the UCSD computational epidemiologist who led the data collection and analysis process, said that he believes artificial intelligence will be a game changer for medicine in its ability to lighten workloads while simultaneously improving quality for patients.

“So many more patients who are now getting no response or a bad response will be able to get answers from an AI equipped physician who will be able to serve far more patients,” Ayers said.

This paper’s results, however, test a very specific set of circumstances pertaining to text communications between doctors and patients and do not generalise to clinical settings.

Researchers pulled 195 randomly-selected questions from the Ask a Doctor subsection of Reddit.com, the popular news aggregation and discussion site. The group, which has nearly 500,000 members, allows anyone to publicly ask any question they want of doctors whose qualifications are verified by Reddit.

Since questions and answers are all made in public for anyone on the Internet to read, feeding them to ChatGPT required no particular data wizardry.

“Honestly, it's just plug-and-play,” Ayers said in an email. “All we did was cut and paste the questions into ChatGPT and save the response.”

No additional refinement was made, he said, after the chatbot delivered an answer.

Chatbot answers tended to be much more verbose and friendly sounding while those from doctors were clearly dashed off by a chronically-busy person relying on shorthand to be as efficient as possible.

In answering the swallowed toothpick question, for example, the doctor’s response starts “If you’ve surpassed 2-6 h??, chances are they’ve passed into your intestines. Which means it can’t be retrieved easily.”

ChatGPT starts out less clinically with: “It’s natural to be concerned if you have ingested a foreign object, but in this case, it’s highly unlikely that the toothpick you swallowed will cause you any serious harm.”

It’s a smooth response, especially for someone who exists on a server somewhere.

The head injury question about hitting a metal bar on a run shows that chatbots simply have time to be more complete.

The physician response dutifully bangs out eight symptoms that should cause the person to see a doctor, including nausea or vomiting, dizziness, severe or worsening headache, loss of consciousness, confusion, neck stiffness, problems with vision and limb weakness, concluding: “If you develop any of these in the next 24 h, rush to the emergency room.”

The chatbot provides a more-complete set of symptoms, telling the patient to be wary of loss of consciousness “even if it’s just for a few seconds”, and includes slurred speech, difficulty with balance or coordination, seizures, changes in behaviour or personality and clear fluid draining from the nose or ears.

And, here again, the chatbot is able to throw in a little additional care that the doctor was presumably too busy to type out.

“While it’s possible that you may be fine, it’s important to be evaluated by a medical professional to rule out any serious injuries,” the chatbot response says. “It is possible that you may have suffered a concussion or other head injury, even if you didn’t lose consciousness.”

Dr David “Davey” Smith, chief of infectious disease research at UCSD and one of the docs tasked to evaluate each pair of question responses, said he found the chatbot’s facility at answering medical questions to be shocking, even knowing that ChatGPT has already successfully passed medical licensing exams.

“It seemed like it could read in the message from the patient that they were anxious or sad or, you know, had emotions attached to these questions,” Smith said. “Not only was it more accurate, because it has all of the information at its fingertips, right, but it was also empathetic, which was pretty cool.”

But does this doctor, who sees patients every day, fear eventual replacement?

No, not at all. AI, he said, is looking like a salve rather than an irritant.

“I get patient emails every day and they’re asking questions almost exactly like this,” Smith said. “And I spend about an hour a day – others spend more – going through emails and answering them as quickly as possible, you know, making an appointment, here’s your prescription, that’s just a hangnail or you need to go to the emergency room.

“I don’t have time for empathy either, I’m just trying to get through it, but what if we had a way where this program could make it easier for us? What if it would draft something out ahead of time and I just review it?”

If the computer has the time to refer back to the actual literature and churn out more-complete answers, and also the time to show a little more evidence of concern for a patient’s anxiety, that, he said, could be revolutionary.

But, he added, no AI is giving his patients advice on its own. At the end of the day, it’s his medical license on the line if the AI gets something wrong.

“The bot can help at the beginning, but it’s on me to sign off,” Smith said. – The San Diego Union-Tribune/dpa

Follow us on our official WhatsApp channel for breaking news alerts and key updates!
   

Next In Tech News

Crypto company Tether invests $200 million in brain-chip maker Blackrock Neurotech
EU to probe Meta over handling of Russian disinformation, FT reports
US man charged with sex-related crimes, used Instagram to lure teens
Apple's iPadOS subject to tough EU tech rules, EU says
TikTok creators fear economic blow of US ban
OpenAI to use FT content for training AI models in latest media tie-up
ChatGPT faces Austria complaint for ‘uncorrectable errors’
Social media platform X back up after outages, Downdetector shows
Sleeping Amazon driver’s fatal crash into teacher was preventable, US lawsuit says
Elon Musk’s China trip pays off with key self-driving hurdles cleared

Others Also Read