Two recent studies have revealed that artificial intelligence (AI), particularly OpenAI’s ChatGPT-4, has outperformed human doctors in diagnosing illnesses and reasoning through complex medical cases. These findings mark a potential turning point in how medical professionals use technology in their practices, though challenges remain in fully integrating AI into healthcare.
Study 1: ChatGPT-4’s Clinical Reasoning Scores
In a study published in JAMA Internal Medicine, ChatGPT-4 was tested on 20 simulated clinical cases and compared to internal medicine residents and attending physicians. The researchers, led by Dr. Adam Rodman of Beth Israel Deaconess Medical Center in Boston, used the R-IDEA scoring system to evaluate clinical reasoning. The median R-IDEA scores were:
- ChatGPT-4: 10
- Attending physicians: 9
- Residents: 8
The chatbot demonstrated the highest probability (99%) of achieving high clinical reasoning scores, significantly outperforming both residents (56%) and attending physicians (76%).
Additionally, while ChatGPT-4 achieved similar accuracy to physicians in including critical “cannot-miss” diagnoses in its differential (67%), its overall performance showed greater consistency. However, it did exhibit more frequent incorrect clinical reasoning compared to residents.
Study 2: Diagnosis Accuracy with Real Cases
A separate study published in JAMA Network Open evaluated ChatGPT’s diagnostic capabilities using real patient case histories. Fifty doctors (a mix of residents and attendings) were divided into three groups:
- Doctors without ChatGPT
- Doctors with ChatGPT as an aid
- ChatGPT alone
The chatbot achieved an impressive 90% accuracy rate in diagnosing illnesses. In comparison, doctors using ChatGPT scored 76%, while those without it scored 74%.
How ChatGPT Outperformed Doctors
The results surprised many researchers, including Dr. Rodman, who admitted he was shocked by the findings. ChatGPT’s performance highlighted two key strengths:
- Lack of Cognitive Bias: Unlike human doctors, ChatGPT does not rely on intuition or experience that may cloud judgment. It can reevaluate cases without attachment to an initial diagnosis, which is a common limitation for human physicians.
- Comprehensive Analysis: ChatGPT analyzes medical cases systematically, offering robust reasoning and differential diagnoses. For instance, in the studies, some doctors failed to recognize ChatGPT’s detailed and accurate responses because they used the bot as a search engine, asking narrow, directed questions.
Why Doctors Struggle with AI Integration
Despite ChatGPT’s clear strengths, its integration into medical practice has encountered challenges:
- Overconfidence in Initial Diagnoses: Doctors often disregarded ChatGPT’s suggestions when they conflicted with their own conclusions. This “anchoring bias” prevented them from fully benefiting from the AI’s insights.
- Underuse of AI’s Capabilities: Many doctors failed to utilize ChatGPT’s full potential. Instead of providing entire case histories for comprehensive analysis, they asked simple, isolated questions. Only a minority of doctors exploited the chatbot’s ability to process complex cases.
- Errors in Clinical Reasoning: ChatGPT still made errors in some scenarios, such as incorrect reasoning pathways, which highlight the importance of human oversight in medical decision-making.
A Historical Context of AI in Medicine
The use of computers for medical diagnosis dates back nearly 70 years. Early attempts, such as the INTERNIST-1 program developed in the 1970s, demonstrated promising diagnostic accuracy but failed to gain widespread adoption due to usability issues. These systems required extensive data input and lacked trust among doctors. Modern large language models like ChatGPT overcome these challenges by using natural language interfaces, enabling seamless integration of case details.
Implications for the Future of Medicine
Experts agree that while AI is unlikely to replace doctors, it has the potential to serve as a valuable “doctor extender.” By augmenting physicians’ reasoning capabilities and providing second opinions, AI could help reduce diagnostic errors and improve patient outcomes.
Dr. Rodman noted that AI tools could be particularly beneficial for experienced doctors who use them to counteract cognitive biases and consider overlooked possibilities. However, he emphasized that proper training is essential for doctors to use AI effectively.
The researchers recommended further studies to explore how AI can complement human reasoning in clinical practice. They also emphasized the need for multifaceted evaluations of AI capabilities before integrating tools like ChatGPT into workflows.
Limitations of the Studies
Both studies acknowledged limitations. The first study used simulated cases from the NEJM Healer educational tool, which may not reflect real-world complexities. The second study’s case histories were unpublished and unfamiliar to the chatbot, but the scenarios tested were intentionally challenging yet not extremely rare.
This article is based on the following articles:
https://www.nytimes.com/2024/11/17/health/chatgpt-ai-doctors-diagnosis.html
https://www.medpagetoday.com/special-reports/features/109444
Background Information
What is Artificial Intelligence?
Artificial intelligence, or AI, refers to computer systems designed to perform tasks that usually require human intelligence. These tasks include understanding language, solving problems, and making decisions. In healthcare, AI tools analyze data, identify patterns, and make recommendations to assist doctors in diagnosing and treating patients.
ChatGPT, developed by OpenAI, is a specific type of AI called a “large language model” (LLM). It uses vast amounts of information to understand and generate text that seems human-like. In medicine, ChatGPT processes patient information, such as symptoms and test results, to suggest possible diagnoses and explain its reasoning.
How Doctors Diagnose Illnesses
Doctors use a process called clinical reasoning to diagnose illnesses. This involves:
- Gathering Information: Doctors start by asking patients about their symptoms, medical history, and lifestyle.
- Performing Tests: They may conduct physical exams, lab tests, or imaging studies (like X-rays).
- Creating a Differential Diagnosis: Doctors list possible conditions that could explain the symptoms and rank them by likelihood.
- Reaching a Final Diagnosis: By analyzing test results and patient information, they narrow down the possibilities to one or more likely conditions.
While this process works well, it is not perfect. Doctors sometimes rely too much on their first idea (a “cognitive bias”) or overlook less obvious conditions.
The Role of AI in Medicine
AI in medicine is not new. Early efforts, like INTERNIST-1 in the 1970s, tried to use computers to help doctors make diagnoses. These programs had some success but were slow and difficult to use, so they never became widely adopted. Recent advances in AI, like ChatGPT, have made these tools much faster and easier to use.
AI offers several potential benefits:
- Improved Accuracy: AI can analyze more data than a human doctor and consider rare conditions that might be overlooked.
- Second Opinions: AI can double-check a doctor’s diagnosis or suggest alternatives, helping to catch mistakes.
- Bias-Free Reasoning: Unlike humans, AI does not form habits or biases, making its suggestions more balanced.
Why These Studies Matter
The studies mentioned in the article are important because they show that AI tools like ChatGPT can sometimes outperform even experienced doctors. This raises questions about how AI can be used to improve healthcare. For example:
- Can AI help reduce medical errors? Studies suggest that diagnostic mistakes cause many deaths each year. AI could help prevent some of these errors by offering better solutions.
- How should doctors and AI work together? Some doctors struggle to use AI effectively. Learning how to integrate AI into medical practice is key to making it useful.
- What are AI’s limits? AI isn’t perfect and still makes mistakes. Doctors must understand when to trust AI and when to rely on their own expertise.
Challenges in Using AI
While AI has promise, there are challenges to overcome:
- Trust: Doctors and patients need to trust that AI is reliable.
- Training: Doctors must learn how to use AI properly, which may require new skills.
- Ethics and Privacy: AI uses large amounts of patient data, so it’s important to ensure privacy and fairness.
- Error Handling: Doctors must know how to recognize and respond when AI gives incorrect information.
The Bigger Picture
AI is being used in many fields beyond medicine. For example:
- Transportation: Self-driving cars use AI to navigate roads.
- Education: AI tools like ChatGPT help students with homework or learning new topics.
- Entertainment: Streaming platforms use AI to recommend movies or shows you might like.
Debate/Essay Questions
- Will AI replace doctors in diagnosing illnesses in the future?
- Are the potential risks of AI in medicine, such as privacy concerns and occasional errors, worth its benefits?
Please subscribe to Insight Fortnight, our biweekly newsletter!