In a recent study presented at the Royal College of General Practitioners (RCGP) Annual Conference 2023, it was revealed that ChatGPT, a cutting-edge artificial intelligence (AI) model, failed to pass the UK’s National Primary Care examinations. According to a Medscape article on 20 October 2023, the findings shed light on the challenges AI faces when dealing with the intricacies of medical knowledge and its inability to match human perceptions of medical complexity.
Shathar Mahmood, a fifth-year medical student at the University of Cambridge School of Clinical Medicine, presented the study’s results, emphasising that ChatGPT often provided novel explanations that were inaccurate, effectively “hallucinating” information. These inaccuracies were presented as factual, demonstrating a significant gap between AI-generated responses and expert medical knowledge.
Can AI Replace Physicians?
The lead author of the study, Arun James Thirunavukarasu, from the University of Oxford and Oxford University Hospitals NHS Foundation Trust, highlighted that the performance of AI on medical school examinations has fuelled discussions about the potential for AI to replace human clinicians. However, he suggested that these discussions might not consider the real-world complexities of clinical practice.
The study assessed ChatGPT’s capabilities in primary care using the Membership of the Royal College of General Practitioners Applied Knowledge Test, a computer-based, multiple-choice assessment that is part of the UK’s specialty training for general practitioners (GPs).
Researchers subjected ChatGPT to 674 questions in two separate runs to evaluate its accuracy. While the AI’s overall performance was decent, with scores of 59.94% and 60.39% in the two runs, a notable 17% of the answers didn’t match, revealing a statistically significant difference. Furthermore, ChatGPT’s performance was 10% lower than the average RCGP pass mark, indicating its limitations in expert-level recall and decision-making.
AI Not Ready to Replace Healthcare Professionals
Another concerning finding was that in some cases, ChatGPT generated uncertain answers or failed to provide any response at all in 1.48% and 2.25% of the questions, respectively.
The study also uncovered ChatGPT’s tendency to produce inaccurate or “hallucinated” responses when presented with certain questions. These responses, when cross-referenced with correct answers, revealed a lack of correlation, making it challenging for non-experts to distinguish between accurate and inaccurate information.
Shathar Mahmood made it clear that, as things currently stand, AI systems like ChatGPT are not ready to replace healthcare professionals, particularly in primary care. She emphasised the need for larger and more medically specific datasets to improve the accuracy and reliability of AI applications in the field of medicine.
A general practitioner (GP) from Watford, Hertfordshire, UK, Sandip Pramanik, commented on the study, noting that ChatGPT struggled to handle the complexity of exam questions based on the primary care system. He pointed out that general practice involves dealing with nuances and uncertainties that AI models like ChatGPT may oversimplify, highlighting the importance of human factors in decision-making.
This study underscores the challenges AI faces in replicating the nuanced and complex decision-making required in primary care. While AI has made significant advancements in various fields, the medical community still requires the expertise and emotional intelligence of human healthcare professionals to provide the best possible patient care. The findings emphasise the need for continued research and development to harness the potential of AI in healthcare while recognising its limitations in understanding the intricacies of medical practice.