Physician Reasoning on Diagnostic Cases With Large Language Models

February 15, 2024 updated by: Jonathan Chen, Stanford University

Diagnostic Reasoning With Large Language Model Chat Bots

This study will evaluate the effect of providing access to GPT-4, a large language model, compared to traditional diagnostic decision support tools on performance on case-based diagnostic reasoning tasks.

Study Overview

Status

Completed

Conditions

Intervention / Treatment

Detailed Description

Artificial intelligence (AI) technologies, specifically advanced large language models like OpenAI's ChatGPT, have the potential to improve medical decision-making. Although ChatGPT-4 was not developed for its use in medical-specific applications, it has demonstrated promise in various healthcare contexts, including medical note-writing, addressing patient inquiries, and facilitating medical consultation. However, little is known about how ChatGPT augments the clinical reasoning abilities of clinicians.

Clinical reasoning is a complex process involving pattern recognition, knowledge application, and probabilistic reasoning. Integrating AI tools like ChatGPT-4 into physician workflows could potentially help reduce clinician workload and decrease the likelihood of missed diagnoses. However, ChatGPT-4 was not developed for the purpose of clinical reasoning nor has it been validated for this purpose. Further, it may be subject to disinformation, including convincing confabulations that may mislead clinicians. If clinicians misuse this tool, it may not improve diagnostic reasoning and could even cause harm. Therefore, it is important to study how clinicians use large language models to augment clinical reasoning prior to routine incorporation into patient care.

In this study, we will randomize participants to answer diagnostic cases with or without access to ChatGPT-4. The participants will be asked to give three differential diagnoses for each case, with supporting and opposing findings for each diagnosis. Additionally they will be asked to provide their top diagnosis along with next diagnostic steps. Answers will be graded by independent reviewers blinded to treatment assignment.

Study Type

Interventional

Enrollment (Actual)

50

Phase

  • Not Applicable

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Contact

Study Contact Backup

Study Locations

    • California
      • Palo Alto, California, United States, 94304
        • Stanford University

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Child
  • Adult
  • Older Adult

Accepts Healthy Volunteers

Yes

Description

Inclusion Criteria:

  • Participants must be licensed physicians and have completed at least post-graduate year 2 (PGY2) of medical training.
  • Training in Internal medicine, family medicine, or emergency medicine.

Exclusion Criteria:

  • Not currently practicing clinically.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

  • Primary Purpose: Diagnostic
  • Allocation: Randomized
  • Interventional Model: Parallel Assignment
  • Masking: Single

Arms and Interventions

Participant Group / Arm
Intervention / Treatment
Active Comparator: GPT-4
Group will be given access to GPT-4.
OpenAI's GPT-4 large language model with chat interface.
No Intervention: Usual resources
Group will not be given access to GPT-4 but will be encouraged to use any resources they wish besides large language models (UpToDate, Dynamed, google, etc).

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Diagnostic reasoning
Time Frame: During evaluation
The primary outcome will be the percent correct (range: 0 to 100) for each case. For each case, participants will be asked for three top diagnoses and findings from the case that support that diagnosis and oppose that diagnosis. Participants will receive 1 point for each plausible diagnosis. Findings supporting the diagnosis and findings opposing the diagnosis will also be graded based on correctness, with 1 point for partially correct and 2 points for completely correct responses. Participants will then be asked to name their top diagnosis, earning one point for a reasonable response and two points for the most correct response. Finally participants will be asked to name up to 3 next steps to further evaluate the patient with one point awarded for a partially correct response and two points for a completely correct response. The primary outcome will be compared on the case-level by the randomized groups.
During evaluation

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Time Spent on Diagnosis
Time Frame: During evaluation
We will compare how much time (in minutes) participants spend per case between the two study arms.
During evaluation

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Investigators

  • Principal Investigator: Jonathan H Chen, MD, PhD, Stanford University
  • Principal Investigator: Adam Rodman, MD, Beth Israel Deaconess Medical Center
  • Principal Investigator: Andrew Olson, MD, University of Minnesota

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

November 29, 2023

Primary Completion (Actual)

December 30, 2023

Study Completion (Actual)

December 30, 2023

Study Registration Dates

First Submitted

November 27, 2023

First Submitted That Met QC Criteria

November 27, 2023

First Posted (Actual)

December 6, 2023

Study Record Updates

Last Update Posted (Actual)

February 20, 2024

Last Update Submitted That Met QC Criteria

February 15, 2024

Last Verified

February 1, 2024

More Information

Terms related to this study

Additional Relevant MeSH Terms

Other Study ID Numbers

  • 71319

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

NO

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Diagnosis

Clinical Trials on GPT-4

3
Subscribe