The Impact of Large Language Models on Diagnostic Reasoning Among LLM-Trained Medical Doctors

July 14, 2025 updated by: Ihsan Ayyub Qazi, PhD, Lahore University of Management Sciences

Diagnostic Reasoning With and Without AI Support: A Randomized Controlled Trial of LLM-Trained Medical Doctors

This study aims to evaluate whether large language model-trained medical doctors demonstrate enhanced diagnostic reasoning performance when utilizing ChatGPT-4o alongside conventional resources compared to using conventional resources alone.

Study Overview

Status

Completed

Conditions

Intervention / Treatment

Detailed Description

Diagnostic errors are a major source of preventable patient harm. Recent advances in Large Language Models (LLM), particularly ChatGPT-4o, have shown promise in enhancing medical decision-making. However, little is known about their impact on medical doctors' (e.g., physicians' and surgeons') diagnostic reasoning.

Diagnostic accuracy relies on complex clinical reasoning and careful evaluation of patient data. While AI assistance could potentially reduce errors and improve efficiency, ChatGPT-4o lacks medical validation and could introduce new risks through incorrect information generation (also known as hallucinations). To mitigate these risks, doctors need adequate training in understanding ChatGPT-4o's capabilities, limitations, and proper usage. Given these uncertainties and the importance of proper AI training, systematic evaluation is essential before clinical implementation.

This randomized study will assess whether ChatGPT-4o access improves LLM-trained medical doctors' diagnostic performance compared to conventional resources (e.g., textbooks, online medical databases) alone. All participating doctors will have completed at least a 10-hour training program covering ChatGPT-4o usage, prompt engineering techniques, and output evaluation strategies. Participants will provide differential diagnoses with supporting evidence and recommended next steps for clinical cases, with responses evaluated by blinded reviewers.

Study Type

Interventional

Enrollment (Actual)

60

Phase

  • Not Applicable

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

    • Punjab
      • Lahore, Punjab, Pakistan, 54792
        • Lahore University of Management Sciences

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Child
  • Adult
  • Older Adult

Accepts Healthy Volunteers

Yes

Description

Inclusion Criteria:

  • Full or Provisionally Registered Medical Practitioners with the Pakistan Medical and Dental Council (PMDC).
  • Completed Bachelor of Medicine, Bachelor of Surgery (MBBS) Exam. The equivalent degree of MBBS in US and Canada is called Doctor of Medicine (MD).
  • Participants must have completed a structured training program on the use of ChatGPT (or a comparable large language model), totaling at least 10 hours of instruction. The program must include hands-on practice related to LLM's aspects, specifically prompt engineering and content evaluation.

Exclusion Criteria:

  • Any other Registered Medical Practitioners (Full or Provisional) with PMDC (e.g., Professionals with Bachelor of Dental Surgery or BDS).

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

  • Primary Purpose: Diagnostic
  • Allocation: Randomized
  • Interventional Model: Parallel Assignment
  • Masking: None (Open Label)

Arms and Interventions

Participant Group / Arm
Intervention / Treatment
Active Comparator: ChatGPT-4o
Group will be given access to ChatGPT-4o.
OpenAI's ChatGPT-4o large language model with chat interface.
No Intervention: Conventional resources
Group will not be given access to ChatGPT-4o but will be encouraged to use any resources they wish besides large language models (PubMed, Google without AI Overviews, etc).

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Diagnostic reasoning
Time Frame: Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment.
The primary outcome will be the percent correct for each case (range: 0 to 100). For each case, participants will be asked for three top diagnoses, findings from the case that support that diagnosis, and findings from the case that oppose that diagnosis. For each plausible diagnosis, participants will receive 1 point. Findings supporting the diagnosis and findings opposing the diagnosis will also be graded based on correctness, with 1 point for partially correct and 2 points for completely correct responses. Participants will then be asked to name their top diagnosis, earning one point for a reasonable response and two points for the most correct response. Finally participants will be asked to name up to 3 next steps to further evaluate the patient with one point awarded for a partially correct response and two points for a completely correct response. The primary outcome will be compared on the case-level by the randomized groups.
Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment.

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Time Spent on Diagnosis
Time Frame: Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment.
We will compare how much time (in seconds) participants spend per case between the two study arms.
Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment.

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Investigators

  • Principal Investigator: Ihsan Ayyub Qazi, PhD, Lahore University of Management Sciences
  • Principal Investigator: Muhammad Asadullah Khawaja, MBBS, King Edward Medical University
  • Principal Investigator: Ayesha Ali, PhD, Lahore University of Management Sciences

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

January 10, 2025

Primary Completion (Actual)

May 17, 2025

Study Completion (Actual)

May 17, 2025

Study Registration Dates

First Submitted

January 4, 2025

First Submitted That Met QC Criteria

January 8, 2025

First Posted (Actual)

January 14, 2025

Study Record Updates

Last Update Posted (Actual)

July 17, 2025

Last Update Submitted That Met QC Criteria

July 14, 2025

Last Verified

July 1, 2025

More Information

Terms related to this study

Additional Relevant MeSH Terms

Other Study ID Numbers

  • IRB-0342

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

NO

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

product manufactured in and exported from the U.S.

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Diagnosis

Clinical Trials on ChatGPT-4o

Subscribe