Research Reveals Limitations of ChatGPT and Similar Models
Artificial intelligence tools like ChatGPT are not reliable for medical diagnosis and perform no better than a simple online search, according to a new study published in Nature Medicine. The research, involving 1,300 participants in the UK, tested several AI models including ChatGPT, Meta’s Llama, and Command R+.
Only One-Third of Diagnoses Were Correct
In the study, participants were given ten different sets of symptoms with established medical diagnoses. The AI models correctly identified the conditions only about one-third of the time—a rate equivalent to that achieved by a control group using standard internet searches.
“There is a lot of hype around AI, but they are simply not ready to replace a doctor,” said Rebecca Payne, a researcher at the University of Oxford and co-author of the study, in a statement.
The Gap Between Exams and Real-World Application
While previous studies have shown that AI can pass medical exam questions, such as multiple-choice tests designed for students, the new findings highlight a significant shortfall when these models interact with real-world symptom descriptions from people.
The study underscores that, despite advancements, human medical professionals remain essential for accurate diagnosis and patient care.

