- ICH GCP
- US Clinical Trials Registry
- Clinical Trial NCT06911645
Diagnostic Reasoning With Customized GPT-4 Model
Evaluating the Performance of LLMs and Clinicians in Complex Diagnostic Cases: A Randomized Controlled Trial
Study Overview
Status
Conditions
Detailed Description
Artificial intelligence (AI) technologies, particularly advanced large language models like OpenAI's ChatGPT, have the potential to enhance medical decision-making. While ChatGPT-4 was not specifically designed for medical applications, it has demonstrated promise in various healthcare contexts, including medical note-writing, addressing patient inquiries, and facilitating medical consultations. However, its impact on clinicians' diagnostic reasoning remains largely unknown.
Clinical reasoning is a complex process that involves pattern recognition, knowledge application, and probabilistic reasoning. Integrating AI tools like ChatGPT-4 into physician workflows could help reduce clinician workload and decrease the likelihood of missed diagnoses. However, ChatGPT-4 was neither developed nor validated for diagnostic reasoning, and it may produce misleading information, including plausible but incorrect conclusions that could misguide clinicians. If not used appropriately, it may fail to improve-and could even hinder-clinical decision-making. Therefore, it is essential to study how clinicians use large language models to support clinical reasoning before integrating them into routine patient care.
This study will examine how immediate access to a customized version of ChatGPT-4 impacts performance on case-based diagnostic reasoning tasks, compared to a stepwise approach. In the stepwise approach, participants will first use traditional diagnostic decision support tools to support their case reasoning before interacting with a customized ChatGPT-4 model, at which point they will have the opportunity to revise their initial answers.
Participants will be randomized into different study arms and will respond to diagnostic cases by providing three differential diagnoses, along with supporting and opposing findings for each. They will also identify their top diagnosis and propose next diagnostic steps. Independent reviewers, blinded to treatment assignment, will evaluate their responses.
Study Type
Enrollment (Actual)
Phase
- Not Applicable
Contacts and Locations
Study Locations
-
-
California
-
Palo Alto, California, United States, 94305
- Stanford University
-
-
Participation Criteria
Eligibility Criteria
Ages Eligible for Study
- Child
- Adult
- Older Adult
Accepts Healthy Volunteers
Description
Inclusion Criteria:
- Participants must be licensed physicians and have completed at least post-graduate year 1 (PGY1) of medical training.
- Training in Internal medicine, family medicine, or emergency medicine.
Exclusion Criteria:
- Not currently practicing clinically.
- Participated in one of our previous studies that used the same six diagnostic cases.
Study Plan
How is the study designed?
Design Details
- Primary Purpose: Diagnostic
- Allocation: Randomized
- Interventional Model: Parallel Assignment
- Masking: Single
Arms and Interventions
Participant Group / Arm |
Intervention / Treatment |
|---|---|
|
Active Comparator: Immediate access to customized version of GPT-4
Group will be encouraged to immediately use a customized version of GPT-4.
|
Group is given immediate access to a customized version of GPT-4 to support their diagnostic reasoning for each case.
|
|
Active Comparator: Conventional resources first, then granted access to customized version of GPT-4.
Group will be encouraged to first use any resources they wish besides large language models (UpToDate, Pubmed, google, etc) and then will be granted access to a customized version of GPT-4.
|
Group is first encouraged to reason through diagnostic cases with the support of conventional resources.
After they submit a case's answers they are then given access to a customized version of GPT-4 and have the opportunity to change their initial answers.
|
What is the study measuring?
Primary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Diagnostic reasoning
Time Frame: Through study completion, an average of 6 months
|
The primary outcome will be the percentage of correct responses per case (range: 0 to 100).
For each case, participants will be asked to provide their top three differential diagnoses, along with supporting and opposing findings for each.
They will receive 1 point for each plausible diagnosis.
Supporting and opposing findings will be graded based on correctness, with 1 point for a partially correct response and 2 points for a completely correct response.
Participants will then select their top diagnosis, earning 1 point for a reasonable choice and 2 points for the most accurate diagnosis.
Finally, they will list up to three next steps for further patient evaluation, with 1 point awarded for a partially correct response and 2 points for a completely correct response.
The primary outcome will be analyzed at the case level, comparing performance between the randomized study groups.
|
Through study completion, an average of 6 months
|
Secondary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Time Spent Per Case
Time Frame: Through study completion, an average of 6 months
|
The investigators will compare the average time (in minutes) participants spend on each case across the two study arms.
|
Through study completion, an average of 6 months
|
|
Prompt frequency
Time Frame: Through study completion, an average of 6 months
|
The investigators will compare the frequency of participant prompts to the customized GPT-4 model between the two study groups.
|
Through study completion, an average of 6 months
|
|
Sentiment
Time Frame: Through study completion, an average of 6 months
|
The investigators will compare the tone and sentiment of participant prompts to the customized GPT-4 model across the two study groups.
The investigators will create a qualitative coding system to categorize the nature of the participants' prompts.
|
Through study completion, an average of 6 months
|
|
Participant Perceptions of AI in Clinical Reasoning
Time Frame: Through study completion, an average of 6 months
|
This outcome would be assessed in both study arms and would encompass changes in attitudes, confidence, and willingness to use AI diagnostic tools before and after being exposed to the customized tool.
We will assess the number of participants who were open to using AI to help with complex clinical reasoning (pre- and post-quiz), if they enjoyed working with the AI diagnostic tool, if they felt like the tool provided a valuable collaborative experience for clinical reasoning, if seeing the AI diagnostic tool's recommendations increased their confidence in their differential diagnoses, and if they would use an AI diagnostic tool like the one in this study in their daily job.
These will be evaluated on a Likert scale ranking from strongly disagree to strongly agree.
|
Through study completion, an average of 6 months
|
|
Customized GPT-4's diagnostic reasoning
Time Frame: Through study completion, an average of 6 months
|
The customized GPT-4's 'independent' diagnoses will be assessed for accuracy.
The outcome will be the percentage of correct responses per case (range: 0 to 100).
For each case, the meta-prompt directs the customized GPT-4 to provide its top three differential diagnoses, along with supporting and opposing findings for each, a final diagnosis, and next steps.
The customized GPT-4 will receive 1 point for each plausible diagnosis.
Supporting and opposing findings will be graded based on correctness, with 1 point for a partially correct response and 2 points for a completely correct response.
Its top diagnosis will earn 1 point for a reasonable choice and 2 points for the most accurate diagnosis.
Finally, it will list up to three next steps for further patient evaluation, with 1 point awarded for a partially correct response and 2 points for a completely correct response.
The outcome will be analyzed at the case level, comparing performance with the randomized study groups' scores.
|
Through study completion, an average of 6 months
|
Collaborators and Investigators
Sponsor
Collaborators
Investigators
- Principal Investigator: Jonathan H Chen, MD, PhD, Stanford University
Study record dates
Study Major Dates
Study Start (Actual)
Primary Completion (Actual)
Study Completion (Actual)
Study Registration Dates
First Submitted
First Submitted That Met QC Criteria
First Posted (Actual)
Study Record Updates
Last Update Posted (Actual)
Last Update Submitted That Met QC Criteria
Last Verified
More Information
Terms related to this study
Additional Relevant MeSH Terms
Other Study ID Numbers
- 71319c
Plan for Individual participant data (IPD)
Plan to Share Individual Participant Data (IPD)?
Drug and device information, study documents
Studies a U.S. FDA-regulated drug product
Studies a U.S. FDA-regulated device product
product manufactured in and exported from the U.S.
This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.
Clinical Trials on Pathologic Processes
-
GuerbetCompletedPathological ProcessesUnited States
-
Philips Digital & Computational PathologyCompleted
-
National Cancer Institute, NaplesRecruiting
-
The Eye Hospital of Wenzhou Medical UniversityRecruiting
-
Institut Claudius RegaudWest Cancerology Institute, France; Toulouse Capitole UniversityRecruiting
-
University Hospital, ToulouseCompletedPlatelet DysfunctionFrance
-
Lahore University of Management SciencesCompleted
-
Fondation Ophtalmologique Adolphe de RothschildCompleted
-
Actavis Inc.Completed
-
Ranbaxy Laboratories LimitedCompletedHealthyUnited States
Clinical Trials on Immediate access to customized version of GPT-4
-
Fred Hutchinson Cancer CenterCompletedSmoking CessationUnited States
-
University of Texas at AustinHarris HealthRecruiting
-
Assiut UniversityNot yet recruitingAdolescents | Psychiatric Disorders | Diabetes Mellitus Type 1
-
University of Alabama at BirminghamBracco CorporateCompletedBreast Malignant Tumor | Breast Benign TumorUnited States
-
Case Comprehensive Cancer CenterWithdrawnHR-positive Breast CancerUnited States
-
Indiana UniversityNational Institute on Drug Abuse (NIDA)CompletedSubstance-Related Disorders | Pregnancy Related | Narcotic-Related Disorders | Substance Use | Opioid Use Disorder | Buprenorphine DependenceUnited States
-
Azienda Sanitaria di FirenzeNot yet recruitingDiverticular Disease of Colon
-
Alexandria UniversityEnrolling by invitationDecision Making , Cesarean SectionEgypt
-
Jonsson Comprehensive Cancer CenterVarian Medical SystemsRecruitingRecurrent Head and Neck Carcinoma | Head and Neck Carcinoma | Metastatic Head and Neck Carcinoma | Localized Head and Neck CarcinomaUnited States