Diagnostic Reasoning With Customized GPT-4 Model

March 27, 2025 updated by: Jonathan Chen, Stanford University

Evaluating the Performance of LLMs and Clinicians in Complex Diagnostic Cases: A Randomized Controlled Trial

This study will assess the impact of immediate access to a customized version of GPT-4, a large language model, on performance in case-based diagnostic reasoning tasks. Specifically, it will compare this approach to a two-step process where participants first use traditional diagnostic decision support tools to support their diagnostic reasoning before gaining access to the customized GPT-4 model.

Study Overview

Status

Completed

Conditions

Intervention / Treatment

Detailed Description

Artificial intelligence (AI) technologies, particularly advanced large language models like OpenAI's ChatGPT, have the potential to enhance medical decision-making. While ChatGPT-4 was not specifically designed for medical applications, it has demonstrated promise in various healthcare contexts, including medical note-writing, addressing patient inquiries, and facilitating medical consultations. However, its impact on clinicians' diagnostic reasoning remains largely unknown.

Clinical reasoning is a complex process that involves pattern recognition, knowledge application, and probabilistic reasoning. Integrating AI tools like ChatGPT-4 into physician workflows could help reduce clinician workload and decrease the likelihood of missed diagnoses. However, ChatGPT-4 was neither developed nor validated for diagnostic reasoning, and it may produce misleading information, including plausible but incorrect conclusions that could misguide clinicians. If not used appropriately, it may fail to improve-and could even hinder-clinical decision-making. Therefore, it is essential to study how clinicians use large language models to support clinical reasoning before integrating them into routine patient care.

This study will examine how immediate access to a customized version of ChatGPT-4 impacts performance on case-based diagnostic reasoning tasks, compared to a stepwise approach. In the stepwise approach, participants will first use traditional diagnostic decision support tools to support their case reasoning before interacting with a customized ChatGPT-4 model, at which point they will have the opportunity to revise their initial answers.

Participants will be randomized into different study arms and will respond to diagnostic cases by providing three differential diagnoses, along with supporting and opposing findings for each. They will also identify their top diagnosis and propose next diagnostic steps. Independent reviewers, blinded to treatment assignment, will evaluate their responses.

Study Type

Interventional

Enrollment (Actual)

Phase

Not Applicable

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

United States
- California
  - Palo Alto, California, United States, 94305
    - Stanford University

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

Child
Adult
Older Adult

Accepts Healthy Volunteers

Yes

Description

Inclusion Criteria:

Participants must be licensed physicians and have completed at least post-graduate year 1 (PGY1) of medical training.
Training in Internal medicine, family medicine, or emergency medicine.

Exclusion Criteria:

Not currently practicing clinically.
Participated in one of our previous studies that used the same six diagnostic cases.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Primary Purpose: Diagnostic
Allocation: Randomized
Interventional Model: Parallel Assignment
Masking: Single

Number of Arms

Arms and Interventions

Participant Group / Arm	Intervention / Treatment
Active Comparator: Immediate access to customized version of GPT-4 Group will be encouraged to immediately use a customized version of GPT-4.	Other: Immediate access to customized version of GPT-4 Group is given immediate access to a customized version of GPT-4 to support their diagnostic reasoning for each case.
Active Comparator: Conventional resources first, then granted access to customized version of GPT-4. Group will be encouraged to first use any resources they wish besides large language models (UpToDate, Pubmed, google, etc) and then will be granted access to a customized version of GPT-4.	Other: Access to customized version of GPT-4 following use of conventional resources Group is first encouraged to reason through diagnostic cases with the support of conventional resources. After they submit a case's answers they are then given access to a customized version of GPT-4 and have the opportunity to change their initial answers.

What is the study measuring?

Primary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Diagnostic reasoning Time Frame: Through study completion, an average of 6 months	The primary outcome will be the percentage of correct responses per case (range: 0 to 100). For each case, participants will be asked to provide their top three differential diagnoses, along with supporting and opposing findings for each. They will receive 1 point for each plausible diagnosis. Supporting and opposing findings will be graded based on correctness, with 1 point for a partially correct response and 2 points for a completely correct response. Participants will then select their top diagnosis, earning 1 point for a reasonable choice and 2 points for the most accurate diagnosis. Finally, they will list up to three next steps for further patient evaluation, with 1 point awarded for a partially correct response and 2 points for a completely correct response. The primary outcome will be analyzed at the case level, comparing performance between the randomized study groups.	Through study completion, an average of 6 months

Secondary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Time Spent Per Case Time Frame: Through study completion, an average of 6 months	The investigators will compare the average time (in minutes) participants spend on each case across the two study arms.	Through study completion, an average of 6 months
Prompt frequency Time Frame: Through study completion, an average of 6 months	The investigators will compare the frequency of participant prompts to the customized GPT-4 model between the two study groups.	Through study completion, an average of 6 months
Sentiment Time Frame: Through study completion, an average of 6 months	The investigators will compare the tone and sentiment of participant prompts to the customized GPT-4 model across the two study groups. The investigators will create a qualitative coding system to categorize the nature of the participants' prompts.	Through study completion, an average of 6 months
Participant Perceptions of AI in Clinical Reasoning Time Frame: Through study completion, an average of 6 months	This outcome would be assessed in both study arms and would encompass changes in attitudes, confidence, and willingness to use AI diagnostic tools before and after being exposed to the customized tool. We will assess the number of participants who were open to using AI to help with complex clinical reasoning (pre- and post-quiz), if they enjoyed working with the AI diagnostic tool, if they felt like the tool provided a valuable collaborative experience for clinical reasoning, if seeing the AI diagnostic tool's recommendations increased their confidence in their differential diagnoses, and if they would use an AI diagnostic tool like the one in this study in their daily job. These will be evaluated on a Likert scale ranking from strongly disagree to strongly agree.	Through study completion, an average of 6 months
Customized GPT-4's diagnostic reasoning Time Frame: Through study completion, an average of 6 months	The customized GPT-4's 'independent' diagnoses will be assessed for accuracy. The outcome will be the percentage of correct responses per case (range: 0 to 100). For each case, the meta-prompt directs the customized GPT-4 to provide its top three differential diagnoses, along with supporting and opposing findings for each, a final diagnosis, and next steps. The customized GPT-4 will receive 1 point for each plausible diagnosis. Supporting and opposing findings will be graded based on correctness, with 1 point for a partially correct response and 2 points for a completely correct response. Its top diagnosis will earn 1 point for a reasonable choice and 2 points for the most accurate diagnosis. Finally, it will list up to three next steps for further patient evaluation, with 1 point awarded for a partially correct response and 2 points for a completely correct response. The outcome will be analyzed at the case level, comparing performance with the randomized study groups' scores.	Through study completion, an average of 6 months

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Sponsor

Stanford University

Collaborators

Beth Israel Deaconess Medical Center

Investigators

Principal Investigator: Jonathan H Chen, MD, PhD, Stanford University

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

December 16, 2024

Primary Completion (Actual)

January 24, 2025

Study Completion (Actual)

January 24, 2025

Study Registration Dates

First Submitted

February 11, 2025

First Submitted That Met QC Criteria

March 27, 2025

First Posted (Actual)

April 4, 2025

Study Record Updates

Last Update Posted (Actual)

April 4, 2025

Last Update Submitted That Met QC Criteria

March 27, 2025

Last Verified

March 1, 2025

More Information

Terms related to this study

Keywords

Additional Relevant MeSH Terms

Pathologic Processes

Other Study ID Numbers

71319c

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

product manufactured in and exported from the U.S.

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Pathologic Processes

Guerbet

Completed

Study to Evaluate the Safety, Efficacy and Pharmacokinetics of Optimark (Gadoversetamide) in Pediatric Patients

Pathological Processes

United States
Philips Digital & Computational Pathology

Completed

Pivotal Study for Validation of Philips Dx (PDx)

Pathologic Processes

United States
National Cancer Institute, Naples

Recruiting

Multi-omic Approach to Cancer Diagnosis: the Italian Network of Excellence for Advanced Diagnosis (INNOVA)

Neoplasms

Italy
The Eye Hospital of Wenzhou Medical University

Recruiting

AI-Driven Cancer Diagnosis and Prediction With EHR

Tumor

China
Institut Claudius Regaud
West Cancerology Institute, France; Toulouse Capitole University

Recruiting

Anthropological, Socio-cultural and Psychological Surgeons' Factors in Oncology and Brakes for Evaluation of Innovations (SURGprofil)

Surgery

France
University Hospital, Toulouse

Completed

Evaluation of a New Approach of the Diagnosis of Constitutional Functional Disorders of Platelets

Platelet Dysfunction

France
Lahore University of Management Sciences

Completed

Automation Bias in Physician-LLM Diagnostic Reasoning

Diagnosis

Pakistan
Fondation Ophtalmologique Adolphe de Rothschild

Completed

Cost of Cancer Diagnosis Using Next-generation Sequencing Targeted Gene Panels in Routine Practice

Cancer
Actavis Inc.

Completed

Comparison of Two KADIAN 10 mg Capsules to a KADIAN 20 mg Capsule Under Fasted Conditions

Healthy

United States
Ranbaxy Laboratories Limited

Completed

Bioequivalence Study of Fluoxetine HCL 40 mg Capsules Under Fasting Conditions

Healthy

United States

Clinical Trials on Immediate access to customized version of GPT-4

Fred Hutchinson Cancer Center

Completed

Pilot Trial of the First Conversational Agent for Smoking Cessation (QuitBot) (QuitBot)

Smoking Cessation

United States
University of Texas at Austin
Harris Health

Recruiting

Kidney Health: Eat Well, Live Well

Chronic Kidney Diseases

United States
Assiut University

Not yet recruiting

Psychiatric Disorders Related to Diabetes Mellitus Type 1 Among Children in Egypt

Adolescents | Psychiatric Disorders | Diabetes Mellitus Type 1
University of Alabama at Birmingham
Bracco Corporate

Completed

Evaluating the Performance of AI in Evaluating Breast MRI Performed With Dose Reduction

Breast Malignant Tumor | Breast Benign Tumor

United States
Case Comprehensive Cancer Center

Withdrawn

Vitamin B12 for Aromatase Inhibitors Associated Musculoskeletal Symptoms in Breast Cancer

HR-positive Breast Cancer

United States
Azienda Sanitaria di Firenze

Not yet recruiting

Percutaneous Drainage Versus Antibiotic Therapy for Acute Diverticulitis With Abscess.

Diverticular Disease of Colon
Indiana University
National Institute on Drug Abuse (NIDA)

Completed

Maternal Brain Imaging in Opioid Use Disorder

Substance-Related Disorders | Pregnancy Related | Narcotic-Related Disorders | Substance Use | Opioid Use Disorder | Buprenorphine Dependence

United States
Alexandria University

Enrolling by invitation

The Impact of a Shared Decision-Making Intervention on Intraoperative Patient Experience During Elective Cesarean Delivery Under Spinal Anesthesia

Decision Making , Cesarean Section

Egypt
Jonsson Comprehensive Cancer Center
Varian Medical Systems

Recruiting

Personalized Ultrafractionated Stereotactic Adaptive Radiotherapy for Palliative Head and Neck Cancer Treatment (PULS-Pal) (PULS-Pal)

Recurrent Head and Neck Carcinoma | Head and Neck Carcinoma | Metastatic Head and Neck Carcinoma | Localized Head and Neck Carcinoma

United States
Senthil Sadhasivam
National Institute on Drug Abuse (NIDA)

Enrolling by invitation

Multimodal Fetal and Placental Imaging and Biomarkers of Clinical Outcomes in Opioid Use Disorder

Substance-Related Disorders | Pregnancy Related | Narcotic-Related Disorders | Opioid Use Disorder | Buprenorphine Dependence | Methadone Dependence

United States

Diagnostic Reasoning With Customized GPT-4 Model

Evaluating the Performance of LLMs and Clinicians in Complex Diagnostic Cases: A Randomized Controlled Trial

Study Overview

Status

Conditions

Intervention / Treatment

Detailed Description

Study Type

Enrollment (Actual)

Phase

Contacts and Locations

Study Locations

Participation Criteria

Eligibility Criteria

Ages Eligible for Study

Accepts Healthy Volunteers

Description

Study Plan

How is the study designed?

Design Details

Number of Arms

Arms and Interventions

Participant Group / Arm

Intervention / Treatment

What is the study measuring?

Primary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Secondary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Collaborators and Investigators

Sponsor

Collaborators

Investigators

Study record dates

Study Major Dates

Study Start (Actual)

Primary Completion (Actual)

Study Completion (Actual)

Study Registration Dates

First Submitted

First Submitted That Met QC Criteria

First Posted (Actual)

Study Record Updates

Last Update Posted (Actual)

Last Update Submitted That Met QC Criteria

Last Verified

More Information

Terms related to this study

Keywords

Additional Relevant MeSH Terms

Other Study ID Numbers

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

product manufactured in and exported from the U.S.

Clinical Trials on Pathologic Processes

Clinical Trials on Immediate access to customized version of GPT-4

Search Similar Trials

Sponsors and Collaborators

Medical Conditions

Drug Interventions

CROs by country

CROs in Iran

Conditions

Rare Diseases

Drug Interventions

Dietary Supplements

Sponsor/Collaborators

Locations