Objectives Artificial intelligence (AI)-driven chatbots have been rapidly adopted across research, education, business, marketing and medicine. Most interactions, however, come from non-experts using chatbots like search engines, including for everyday health and medical queries.
Design We conducted an original study to audit chatbot responses in health and medical fields prone to misinformation. Methods Five popular chatbots were assessed: Gemini (Google), DeepSeek (High-Flyer), Meta AI (Meta), ChatGPT (OpenAI) and Grok (xAI).
In February 2025, each chatbot was prompted with 10 questions from five categories: cancer, vaccines, stem cells, nutrition and athletic performance. We deployed an adversarial-like framework, using open- and closed-ended prompts designed to strain models toward misinformation or contraindicated advice.
Two experts from each category rated responses as ‘non-problematic’, ‘somewhat problematic’ or ‘ highly problematic’ using a coding matrix based on objective, predefined criteria. Citations were scored for accuracy and completeness, and each response was given a Flesch Reading Ease score.
Results Nearly half (49.6%) of responses were problematic: 30% somewhat problematic and 19.6% highly problematic.
BMJ Open published a clinical update in Research Highlights on 14 Apr 2026.
The item focuses on Generative artificial intelligence-driven chatbots and medical misinformation: an accuracy, referencing and readability audit.
Review the original article for the full source wording and details.