Evaluating Decision-making Using ChatGPT-4 Among Trainees in Surgery (EDuCATe)

April 8, 2025 updated by: Manuela Mastronardi, Ospedali Riuniti Trieste

Evaluating ChatGPT-4 as a Decision-Making Support Tool for Surgical Trainees

This study aims to assess whether ChatGPT-4 can support surgical trainees in clinical decision-making. By comparing the performance of ChatGPT-4 with junior residents, senior residents, and attending surgeons on standardized clinical scenarios, the study seeks to understand the potential role of large language models in surgical education. The ultimate goal is to evaluate whether ChatGPT-4 can be safely integrated as a supplementary educational tool to aid junior residents in developing critical thinking and surgical judgment.

Study Overview

Status

Not yet recruiting

Conditions

Intervention / Treatment

Other: Clinical Cases

Detailed Description

Background:

Artificial Intelligence (AI) is rapidly transforming the medical landscape, offering new possibilities in education, diagnostics, and decision support. In surgery, clinical decision-making is a core competency developed progressively through training. ChatGPT-4, a state-of-the-art large language model developed by OpenAI, has demonstrated competence in handling medical queries and clinical reasoning tasks. However, its performance in complex surgical decision-making compared to human trainees remains largely unexplored.

Objective:

The EDuCATe study aims to evaluate the accuracy and reliability of ChatGPT-4's responses to clinical scenarios involving general surgery cases. Specifically, the study compares the model's performance to that of junior residents, senior residents, and attending surgeons to understand if ChatGPT-4 can serve as a safe and effective educational tool for surgical trainees.

Methods:

Seven clinical scenarios will be constructed using real anonymized patient data representing common general surgery conditions. Each case will be presented step-by-step, mimicking the clinical decision-making process. Participants will answer a question related to treatment choice.

Participants will include junior residents (PGY1-2), senior residents (PGY3+), and attending surgeons from a single surgical department. ChatGPT-4 will be prompted with the same scenarios. All participants will be instructed to complete the cases without using external resources such as AI tools or internet searches, relying solely on their clinical knowledge.

Statistical analysis will compare performance across groups using non-parametric tests (e.g., Wilcoxon rank sum).

Expected Outcomes:

The study hypothesizes that ChatGPT-4 will perform at a level comparable to senior residents or attending surgeons and outperform junior residents in decision-making. If confirmed, these results could support the safe use of ChatGPT-4 as a training aid for junior surgical residents, potentially improving educational outcomes and clinical reasoning skills.

Significance:

This study will provide novel insight into the role of AI in surgical education. By rigorously comparing ChatGPT-4's decision-making capabilities to that of human surgeons at various levels, the study hopes to define its utility, limitations, and appropriate use in residency training programs.

Study Type

Observational

Enrollment (Estimated)

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Contact

Name: Manuela Mastronardi
Phone Number: 0403994152
Email: manuela.mastronardi@gmail.com

Study Locations

Italy
- - Trieste, Italy
    - University of Trieste
    - Contact:
      
      Manuela Mastronardi
      
      Phone Number: 0403994152
      
      Email: manuela.mastronardi@gmail.com
    - Principal Investigator:
      
      Silvia Palmisano
    - Principal Investigator:
      
      Manuela Mastronardi
    - Sub-Investigator:
      
      Paola Germani
    - Sub-Investigator:
      
      Margherita Sandano

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

Child
Adult
Older Adult

Accepts Healthy Volunteers

Sampling Method

Non-Probability Sample

Study Population

The study population will consist of general surgery trainees and faculty members from a single academic surgical department. Participants will be stratified into three groups based on their level of training and experience: Junior Residents: Postgraduate Year (PGY) 1-2; Senior Residents: PGY 3 and above; Attending Surgeons: Board-certified general surgeons with independent clinical practice

Description

Inclusion Criteria:

Actively enrolled or employed in the general surgery residency or department at the participating institution
Willingness to participate and complete all clinical case scenarios
Consent to participate in the study

Exclusion Criteria:

Incomplete responses
Use of external assistance (e.g., internet search, AI tools) when answering scenarios, as self-reported in instructions

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Number of groups / cohorts

Cohorts and Interventions

Group / Cohort	Intervention / Treatment
Residents Junior and Senior Residents in General Surgery	Other: Clinical Cases Seven clinical cases that have to be analysed
Consultants Senior General Surgeons	Other: Clinical Cases Seven clinical cases that have to be analysed
ChatGPT 4 Artificial Intelligence System	Other: Clinical Cases Seven clinical cases that have to be analysed

What is the study measuring?

Primary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Proportion of correct responses Time Frame: Baseline	Binary outcome (correct vs. incorrect decision)	Baseline

Secondary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Comparison of accuracy across experience levels Time Frame: Baseline	Proportion of correct responses by group: Junior residents, Senior residents, Attending surgeons, ChatGPT-4	Baseline
Confidence level Time Frame: Baseline	Participants and ChatGPT are asked to rate how confident they feel in their answer (1-5 Likert scale, where 1 means no confident and 5 very confident)	Baseline
Percentage of use of AI for clinical cases evaluation Time Frame: Baseline	Participants are asked if they use or not ChatGPT in their clinical activity	Baseline

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Sponsor

Ospedali Riuniti Trieste

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Estimated)

April 10, 2025

Primary Completion (Estimated)

April 25, 2025

Study Completion (Estimated)

April 30, 2025

Study Registration Dates

First Submitted

April 2, 2025

First Submitted That Met QC Criteria

April 8, 2025

First Posted (Actual)

April 10, 2025

Study Record Updates

Last Update Posted (Actual)

April 10, 2025

Last Update Submitted That Met QC Criteria

April 8, 2025

Last Verified

April 1, 2025

More Information

Terms related to this study

Keywords

Other Study ID Numbers

AI-Surgical training

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Surgery

Medtronic

Completed

Product Surveillance Registry; Ear, Nose &Amp; Throat - NIM Vital Cohort (PSR-ENT)

Thoracic Surgery | Spine Surgery | Upper Extremity Surgery | Lower Extremity Surgery | Intracranial Surgery | Extracranial Surgery | Intratemporal Surgery | Extratemporal Surgery | Neck Surgery

Germany
[Redacted]
3M; Solventum US LLC

Withheld

[Trial of device that is not approved or cleared by the U.S. FDA]

Abdominal Surgery | Orthopedic Surgery | Vascular Surgery | Cardiovascular Surgery

United States
Vanderbilt University

Completed

Comparison of Central Versus Peripheral Placement of Local Anesthetic

Hand Surgery | Wrist Surgery | Forearm Surgery | Elbow Surgery

United States
University of Alabama at Birmingham

Not yet recruiting

The Effect of Nutritional Education on Post-Arthroscopic Nutritional Status

Orthopedic Surgery | Nutrition | Shoulder Arthroscopic Surgery | Arthroscopic Surgery | Knee Arthroscopic Surgery

United States
Nantes University Hospital

Completed

Medico-economic Evaluation of Subcutaneous Automatic Resorbable Staples Device (S2CARA)

Gynecological Surgery | Plastic Surgery | ENT Surgery

France
Edwards Lifesciences

Completed

Assisted Fluid Management IDE Study (AFM)

Abdominal Surgery | Pelvic Surgery | Non-Cardiac/ Non-Thoracic Surgery | Major Peripheral Vascular Surgery

United States
Centre Hospitalier Universitaire de Nīmes

Not yet recruiting

Bicarbonate Addition to Lidocaine-Epinephrine in Surgery Under WALANT (ROPIWA-2)

Anesthesia, Local | Foot Surgery | Hand Surgery | Walant Surgery

France
Baylor Research Institute
Chiesi USA, Inc.

Completed

Management of Antiplatelet Regimen During Surgical Procedures (MARS)

Surgery | Cardiac Surgery | Surgery--Complications | Percutaneous Coronary Intervention

United States
Maquet Cardiopulmonary GmbH
NAMSA

Terminated

Heart Lung Machine Registry (HeaLMe)

Cardiac Surgery | Cardiopulmonary Bypass | Thoracic Surgery | Vascular Surgery

Spain, Italy
Vanderbilt University
Edwards Lifesciences

Completed

Study of Clinical Outcomes Associated With the Pulmonary Artery Catheter (PAC) in Cardiac Surgery Patients

Cardiac Surgery | Thoracic Surgery | Heart Surgery | Heart Transplant

Clinical Trials on Clinical Cases

University Hospital, Montpellier
INRAE Bordeaux

Not yet recruiting

Study of the Nutritional, Inflammatory, and Metabolic Endophenotypes of Attention-Deficit/Hyperactivity Disorder (ADHD) (ANIME)

ADHD - Attention Deficit Disorder With Hyperactivity
Adhera Health, Inc.
Hospital Universitario Virgen Macarena

Completed

SENSING-AI in Patients with Long COVID (SENSING-AI) (SENSING-AI)

Post-acute COVID-19 Syndrome

Spain
Centre Hospitalier Universitaire de Saint Etienne

Completed

Electro-clinical Features and Functional Connectivity Analysis in SYN1 Gene Mutation-related Epilepsy

Epileptic Syndromes

France
Rigshospitalet, Denmark
University of Copenhagen; Centre for Clinical Education

Completed

The Effect of Constructing Virtual Patient Cases

Healthy
Unity Health Toronto

Completed

A Computer-Based Simulation of DKA Management (DKA)

Diabetes
Public Health England

Withdrawn

Host-pathogen Interactions in Meningococcal Disease: Finding the Key That Fits the Lock (Lock and Key)

Meningitis

United Kingdom
Imperial College London
University of Central Lancashire; University of Nottingham; Cancer Research UK; University of Birmingham and other collaborators

Recruiting

Cancer Loyalty Card Study 2 (CLOCS-2) ((CLOCS-2))

Stomach Cancer | Pancreatic Cancer | Ovarian Cancer | Vulvar Cancer | Bladder Cancer | Endometrial Cancer | Colon Cancer | Liver Cancer | Oesophageal Cancer | Uterine Cancer

United Kingdom
University of Glasgow

Completed

LOnger-term Effects of COVID-19 INfection on Blood Vessels And Blood pRessure (LOCHINVAR) (LOCHINVAR)

Hypertension | Covid19

United Kingdom
Adiyaman University Research Hospital

Completed

Traumatic Characteristics of the Forensic Cases Admitted to Emergency Department and Errors in the Forensic Reports

Emergencies | Trauma

Turkey
Loren Laine

Completed

Image-Enhanced Endoscopy (IEE) for Diagnosis of Non-Erosive Reflux Disease

GERD

United States

Evaluating Decision-making Using ChatGPT-4 Among Trainees in Surgery (EDuCATe)

Evaluating ChatGPT-4 as a Decision-Making Support Tool for Surgical Trainees

Study Overview

Status

Conditions

Intervention / Treatment

Detailed Description

Study Type

Enrollment (Estimated)

Contacts and Locations

Study Contact

Study Locations

Participation Criteria

Eligibility Criteria

Ages Eligible for Study

Accepts Healthy Volunteers

Sampling Method

Study Population

Description

Study Plan

How is the study designed?

Design Details

Number of groups / cohorts

Cohorts and Interventions

Group / Cohort

Intervention / Treatment

What is the study measuring?

Primary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Secondary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Collaborators and Investigators

Sponsor

Study record dates

Study Major Dates

Study Start (Estimated)

Primary Completion (Estimated)

Study Completion (Estimated)

Study Registration Dates

First Submitted

First Submitted That Met QC Criteria

First Posted (Actual)

Study Record Updates

Last Update Posted (Actual)

Last Update Submitted That Met QC Criteria

Last Verified

More Information

Terms related to this study

Keywords

Other Study ID Numbers

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

Clinical Trials on Surgery

Clinical Trials on Clinical Cases

Search Similar Trials

Sponsors and Collaborators

Medical Conditions

Drug Interventions

CROs by country

CROs in Guatemala

Conditions

Rare Diseases

Drug Interventions

Dietary Supplements

Sponsor/Collaborators

Locations