The Impact of Large Language Models on Diagnostic Reasoning Among LLM-Trained Medical Doctors

July 14, 2025 updated by: Ihsan Ayyub Qazi, PhD, Lahore University of Management Sciences

Diagnostic Reasoning With and Without AI Support: A Randomized Controlled Trial of LLM-Trained Medical Doctors

This study aims to evaluate whether large language model-trained medical doctors demonstrate enhanced diagnostic reasoning performance when utilizing ChatGPT-4o alongside conventional resources compared to using conventional resources alone.

Study Overview

Status

Completed

Conditions

Diagnosis

Intervention / Treatment

Other: ChatGPT-4o

Detailed Description

Diagnostic errors are a major source of preventable patient harm. Recent advances in Large Language Models (LLM), particularly ChatGPT-4o, have shown promise in enhancing medical decision-making. However, little is known about their impact on medical doctors' (e.g., physicians' and surgeons') diagnostic reasoning.

Diagnostic accuracy relies on complex clinical reasoning and careful evaluation of patient data. While AI assistance could potentially reduce errors and improve efficiency, ChatGPT-4o lacks medical validation and could introduce new risks through incorrect information generation (also known as hallucinations). To mitigate these risks, doctors need adequate training in understanding ChatGPT-4o's capabilities, limitations, and proper usage. Given these uncertainties and the importance of proper AI training, systematic evaluation is essential before clinical implementation.

This randomized study will assess whether ChatGPT-4o access improves LLM-trained medical doctors' diagnostic performance compared to conventional resources (e.g., textbooks, online medical databases) alone. All participating doctors will have completed at least a 10-hour training program covering ChatGPT-4o usage, prompt engineering techniques, and output evaluation strategies. Participants will provide differential diagnoses with supporting evidence and recommended next steps for clinical cases, with responses evaluated by blinded reviewers.

Study Type

Interventional

Enrollment (Actual)

Phase

Not Applicable

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

Pakistan
- Punjab
  - Lahore, Punjab, Pakistan, 54792
    - Lahore University of Management Sciences

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

Child
Adult
Older Adult

Accepts Healthy Volunteers

Yes

Description

Inclusion Criteria:

Full or Provisionally Registered Medical Practitioners with the Pakistan Medical and Dental Council (PMDC).
Completed Bachelor of Medicine, Bachelor of Surgery (MBBS) Exam. The equivalent degree of MBBS in US and Canada is called Doctor of Medicine (MD).
Participants must have completed a structured training program on the use of ChatGPT (or a comparable large language model), totaling at least 10 hours of instruction. The program must include hands-on practice related to LLM's aspects, specifically prompt engineering and content evaluation.

Exclusion Criteria:

Any other Registered Medical Practitioners (Full or Provisional) with PMDC (e.g., Professionals with Bachelor of Dental Surgery or BDS).

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Primary Purpose: Diagnostic
Allocation: Randomized
Interventional Model: Parallel Assignment
Masking: None (Open Label)

Number of Arms

Arms and Interventions

Participant Group / Arm	Intervention / Treatment
Active Comparator: ChatGPT-4o Group will be given access to ChatGPT-4o.	Other: ChatGPT-4o OpenAI's ChatGPT-4o large language model with chat interface.
No Intervention: Conventional resources Group will not be given access to ChatGPT-4o but will be encouraged to use any resources they wish besides large language models (PubMed, Google without AI Overviews, etc).

What is the study measuring?

Primary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Diagnostic reasoning Time Frame: Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment.	The primary outcome will be the percent correct for each case (range: 0 to 100). For each case, participants will be asked for three top diagnoses, findings from the case that support that diagnosis, and findings from the case that oppose that diagnosis. For each plausible diagnosis, participants will receive 1 point. Findings supporting the diagnosis and findings opposing the diagnosis will also be graded based on correctness, with 1 point for partially correct and 2 points for completely correct responses. Participants will then be asked to name their top diagnosis, earning one point for a reasonable response and two points for the most correct response. Finally participants will be asked to name up to 3 next steps to further evaluate the patient with one point awarded for a partially correct response and two points for a completely correct response. The primary outcome will be compared on the case-level by the randomized groups.	Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment.

Secondary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Time Spent on Diagnosis Time Frame: Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment.	We will compare how much time (in seconds) participants spend per case between the two study arms.	Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment.

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Sponsor

Lahore University of Management Sciences

Collaborators

King Edward Medical University

Investigators

Principal Investigator: Ihsan Ayyub Qazi, PhD, Lahore University of Management Sciences
Principal Investigator: Muhammad Asadullah Khawaja, MBBS, King Edward Medical University
Principal Investigator: Ayesha Ali, PhD, Lahore University of Management Sciences

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

January 10, 2025

Primary Completion (Actual)

May 17, 2025

Study Completion (Actual)

May 17, 2025

Study Registration Dates

First Submitted

January 4, 2025

First Submitted That Met QC Criteria

January 8, 2025

First Posted (Actual)

January 14, 2025

Study Record Updates

Last Update Posted (Actual)

July 17, 2025

Last Update Submitted That Met QC Criteria

July 14, 2025

Last Verified

July 1, 2025

More Information

Terms related to this study

Keywords

Additional Relevant MeSH Terms

Other Study ID Numbers

IRB-0342

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

product manufactured in and exported from the U.S.

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Diagnosis

SuperSonic Imagine

Terminated

Improvement Image Quality for SuperSonic® MACH Ultrasound System (MACH IQ)

Diagnosis

France
Umraniye Education and Research Hospital

Completed

Uterine Artery Diastolic Notching & Apelin-13 and 36

Diagnosis

Turkey
European Institute of Oncology
European Union

Recruiting

Digital Solutions for bEtter cAre (ALTHEA)

Cancer Diagnosis

France, Lithuania, Germany, Italy, Spain
Peking Union Medical College Hospital

Not yet recruiting

Mapping of Genomic Structural Variations in Major Birth Defects

Prenatal Diagnosis
Danderyd Hospital

Recruiting

MEDECA - Markers in Early Detection of Cancer (MEDECA)

Cancer | Diagnosis

Sweden
Vrije Universiteit Brussel

Recruiting

Pilot-testing a Perinatal Palliative Care Intervention Program (PPC-pilot)

Perinatal Palliative Care | Life-limiting Fetal Diagnosis | Life-limiting Neonatal Diagnosis

Belgium
Beytepe Murat Erdi Eker State Hospital

Completed

Effects of Selenium and Melatonin on Ocular Ischemic Syndrome

Anterior Segment Ischemia (Diagnosis)
Columbia University
Eunice Kennedy Shriver National Institute of Child Health and Human Development...

Recruiting

guideSEQ: Genomic Understanding, Impact, Decision & Ethics in Prenatal Sequencing

Prenatal Genetic Diagnosis

United States
Identifai Genetics

Recruiting

Identifai Genetics Analytic Validity Study - Compound Heterozygosity and Samples Collection

Genetics | Prenatal Diagnosis

United States
University of Alberta

Completed

Examining Clinical Reasoning With Eye-tracking

Diagnosis | Educational Problems

Canada

Clinical Trials on ChatGPT-4o

Istituto Clinico Humanitas
Fondazione I.R.C.C.S. Istituto Neurologico Carlo Besta

Completed

ChatGPT in the Diagnosis and Management of Complex Polyneuropathies: Comparative Analysis With Neurologists Using Real-World Cases (REASON)

Polyneuropathies

Italy
Lahore University of Management Sciences

Completed

Automation Bias in Physician-LLM Diagnostic Reasoning

Diagnosis

Pakistan
Maastricht University
Aga Khan University; University of Indonesia, Jakarta, Indonesia

Completed

The Big Unknown: A Journey Into Generative AI's Transformative Effect on Medical Professions

Diagnosis | Vignette of Fictional Patients

Netherlands, Indonesia, Kenya
Philipps University Marburg

Completed

Al to Improve the Diagnosis of Rare Rheumatic Diseases (AIDRARER)

Rheumatic Diseases

Germany
Charite University, Berlin, Germany
German Research Foundation; Max Planck Institute for Human Development

Not yet recruiting

Ovarian Cancer Screening and AI (AI-OCS-Gyn)

Ovarian Cancer Screening Recommendations by Gynecologists

Germany
North Sichuan Medical College
Peking University; Peking University First Hospital; Monash University; Case Western... and other collaborators

Not yet recruiting

Multi-Disciplinary Treatment on the Anthropomorphism of Large Language Models (MDTALLM)

Heart Diseases | Infections | Pneumonia | Disease | Cancer | Respiratory Failure

China
Chang Gung University of Science and Technology
National Science and Technology Council, Taiwan

Not yet recruiting

ChatGPT -Based Intervention for Social Frailty in Older Women With CHF : Gender Differences

Social Communication | CHF - Congestive Heart Failure | 65 Years Older
Boston Intelligent Medical Research Center, Shenzhen...
Tsinghua University

Not yet recruiting

ChatGPT v.s. Human in Writing a Preoperative Visit Sheet

Preoperative Care
Marmara University Pendik Training and Research...

Not yet recruiting

Diagnostic Accuracy of GPT-4o and Claude for HEART Score Calculation in Chest Pain (LLM-HEART)

Emergency Medicine | Chest Pain Rule Out Myocardial Infarction | Artificial Intelligence (AI) | Artificial Intelligence (AI) in Diagnosis

Turkey (Türkiye)
North Sichuan Medical College
Afﬁliated Hospital of North Sichuan Medical College

Completed

Ophthalmic Diseases and AI: an RCT Study

Eye Diseases

China

The Impact of Large Language Models on Diagnostic Reasoning Among LLM-Trained Medical Doctors

Diagnostic Reasoning With and Without AI Support: A Randomized Controlled Trial of LLM-Trained Medical Doctors

Study Overview

Status

Conditions

Intervention / Treatment

Detailed Description

Study Type

Enrollment (Actual)

Phase

Contacts and Locations

Study Locations

Participation Criteria

Eligibility Criteria

Ages Eligible for Study

Accepts Healthy Volunteers

Description

Study Plan

How is the study designed?

Design Details

Number of Arms

Arms and Interventions

Participant Group / Arm

Intervention / Treatment

What is the study measuring?

Primary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Secondary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Collaborators and Investigators

Sponsor

Collaborators

Investigators

Study record dates

Study Major Dates

Study Start (Actual)

Primary Completion (Actual)

Study Completion (Actual)

Study Registration Dates

First Submitted

First Submitted That Met QC Criteria

First Posted (Actual)

Study Record Updates

Last Update Posted (Actual)

Last Update Submitted That Met QC Criteria

Last Verified

More Information

Terms related to this study

Keywords

Additional Relevant MeSH Terms

Other Study ID Numbers

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

product manufactured in and exported from the U.S.

Clinical Trials on Diagnosis

Clinical Trials on ChatGPT-4o

Search Similar Trials

Sponsors and Collaborators

Medical Conditions

Drug Interventions

CROs by country

CROs in Cambodia

Conditions

Rare Diseases

Drug Interventions

Dietary Supplements

Sponsor/Collaborators

Locations