Human doctors largely bested an AI model in a new study that involved a head-to-head patient diagnosis challenge. Where do docs still shine? Showing their work. How so? The National Institutes of Health and Weill Cornell Medicine researchers tasked doctors and an AI model with completing 207 challenge quizzes in which they looked at clinical images and a short description of a patient’s symptoms, then selected a diagnosis from multiple choice answers. The nine doctors in the study had different medical specialties, including dermatology, gastroenterology and infectious diseases. The doctors answered questions in their specialty open-book style, where they could use outside material, including online sources, and closed-book style, in which they had no help. Researchers asked the AI model to provide a written explanation for each diagnosis, including a description of the image, a summary of relevant medical context and step-by-step reasoning for the answer it chose. The study was published this week in the journal npj Digital Medicine. How’d it go? — The AI model and doctors scored high when selecting patient diagnoses correctly. — The AI model beat doctors in the closed-book test, selecting more correct diagnoses. — Doctors won the open-book test, particularly on the most difficult questions. The bottom line: The AI model floundered, becoming confused and making mistakes when explaining how it arrived at a diagnosis. For example, it didn’t correctly identify that two lesions on a patient’s arm were linked to the same diagnosis when photos of the lesions were taken from different angles. Why it matters: While the study was small, the researchers say it’s important because it highlights risks for doctors relying on AI in a clinical setting. “Integration of AI into health care holds great promise as a tool to help medical professionals diagnose patients faster, allowing them to start treatment sooner,” Stephen Sherry, acting director of the NIH’s National Library of Medicine, said in a statement. But that promise comes with a key caveat. “AI is not advanced enough yet to replace human experience, which is crucial for accurate diagnosis,” Sherry said.
|