Updated: May 2026
ChatGPT-style tools can sound medically confident. That is both their strength and their danger.
Short answer: Large language models can help doctors summarize, brainstorm differentials, explain concepts, and draft patient instructions. They should not be used as an unsupervised diagnostic authority, especially for emergencies, children, drug doses, pregnancy, or complex illness.
Why ChatGPT feels so impressive in medicine
Large language models are trained to generate language. Medicine is full of language: histories, notes, discharge summaries, guidelines, exam questions, counselling scripts, and research abstracts. So it is not surprising that these models can produce answers that sound clinically polished.
But fluency is not the same as truth. A wrong answer written beautifully is still wrong.
What the randomized trial found
A JAMA Network Open randomized clinical trial tested whether access to GPT-4 improved physicians’ diagnostic reasoning compared with conventional resources. The study recruited 50 US-licensed physicians and included 244 completed cases.
| Finding | Result |
|---|---|
| Participants | 50 physicians |
| Cases completed | 244 cases |
| Diagnostic reasoning score | 76% with LLM vs 74% with conventional resources |
| Difference | 2 percentage points; not statistically significant |
| Time per case | 519 seconds with LLM vs 565 seconds control; not statistically significant |
The lesson is not that LLMs are useless. The lesson is that simply giving doctors a chatbot does not automatically improve clinical reasoning. Integration, training, prompt quality, and clinical context matter.
Where doctors can use LLMs safely
- Generate a differential diagnosis checklist.
- Ask for red flags not to miss.
- Rewrite discharge instructions in simple language.
- Summarize a long referral note.
- Convert a guideline into a bedside checklist.
- Create patient counselling scripts.
- Generate exam/viva practice cases for medical students.
Where LLMs are dangerous
They are dangerous when used as final decision-makers. They can hallucinate citations, miss a life-threatening diagnosis, anchor on the wrong detail, produce outdated treatment, or recommend a drug without understanding the patient’s weight, renal function, allergy, pregnancy status, or local availability.
In pediatrics, this matters even more. A small dosing error can be serious. A missed danger sign in a neonate can be fatal.
Safe prompts for doctors
Instead of asking, “What is the diagnosis?”, ask:
- “List the dangerous diagnoses I should not miss.”
- “What red flags would change this plan?”
- “What information is missing before deciding?”
- “Give me a differential diagnosis grouped by common, dangerous, and rare causes.”
- “Rewrite this plan for parents in simple language without changing the medical meaning.”
For patients: use AI to prepare, not to replace care
Patients can use AI to understand terms, prepare questions, or learn what symptoms to watch for. But they should not use it to delay care when red flags are present.
Do not rely on AI alone for:
- Chest pain or stroke symptoms.
- Seizure, unconsciousness, severe headache, neck stiffness.
- Child with fast breathing, poor feeding, lethargy, bluish lips, or dehydration.
- Pregnancy bleeding or severe abdominal pain.
- Poisoning, overdose, or self-harm risk.
My take
ChatGPT can be a useful thinking partner. It is not a doctor. For clinicians, the best use is to widen thinking and improve communication. The worst use is to outsource judgment.
Use it like a fast intern who reads a lot, writes well, and sometimes lies without knowing it.
Sources checked
- JAMA Network Open: Large language model influence on diagnostic reasoning
- Nature Medicine: Generative artificial intelligence in medicine
- WHO: Ethics and governance of AI for health – large multimodal models
- Nature Medicine: Reliability of LLMs as medical assistants for the general public
No spam. Just a short email when I publish something new.