- ICH GCP
- US Clinical Trials Registry
- Clinical Trial NCT06098950
Human Algorithm Interactions for Acute Respiratory Failure Diagnosis
Measuring the Impact of AI in the Diagnosis of Hospitalized Patients: A Randomized Survey Vignette Multicenter Study
Artificial intelligence (AI) shows promising in identifying abnormalities in clinical images. However, systematically biased AI models, where a model makes inaccurate predictions for entire subpopulations, can lead to errors and potential harms. When shown incorrect predictions from an AI model, clinician diagnostic accuracy can be harmed. This study aims to study the effectiveness of providing clinicians with image-based AI model explanations when provided AI model predictions to help clinicians better understand the logic of an AI model's prediction. It will evaluate whether providing clinicians with AI model explanations can improve diagnostic accuracy and help clinicians catch when models are making incorrect decisions. As a test case, the study will focus on the diagnosis of acute respiratory failure because determining the underlying causes of acute respiratory failure is critically important for guiding treatment decisions but can be clinically challenging.
To determine if providing AI explanations can improve clinician diagnostic accuracy and alleviate the potential impact of showing clinicians a systematically biased AI model, a randomized clinical vignette survey study will be conducted. During the survey, study participants will be shown clinical vignettes of patients hospitalized with acute respiratory failure, including the patient's presenting symptoms, physical exam, laboratory results, and chest X-ray. Study participants will then be asked to assess the likelihood that heart failure, pneumonia and/or Chronic Obstructive Pulmonary Disease (COPD) is the underlying diagnosis. During specific vignettes in the survey, participants will also be shown standard or systematically biased AI models that provide an estimate the likelihood that heart failure, pneumonia and/or COPD is the underlying diagnosis. Clinicians will be randomized see AI predictions alone or AI predictions with explanations when shown AI models. This survey design will allow for testing the hypothesis that systematically biased models would harm clinician diagnostic accuracy, but commonly used image-based explanations would help clinicians partially recover their performance.
Study Overview
Status
Conditions
Study Type
Enrollment (Actual)
Phase
- Not Applicable
Contacts and Locations
Study Locations
-
-
Michigan
-
Ann Arbor, Michigan, United States, 48103
- University of Michigan
-
-
Participation Criteria
Eligibility Criteria
Ages Eligible for Study
- Adult
- Older Adult
Accepts Healthy Volunteers
Description
Inclusion Criteria:
- Physicians, nurse practitioners, and physician assistants that care for patients with acute respiratory failure as part of their clinical practice
Exclusion Criteria:
- Physicians, nurse practitioners, and physician assistants that only provide patient care in outpatient settings
Study Plan
How is the study designed?
Design Details
- Primary Purpose: Other
- Allocation: Randomized
- Interventional Model: Parallel Assignment
- Masking: Single
Arms and Interventions
Participant Group / Arm |
Intervention / Treatment |
|---|---|
|
Experimental: AI model biased for heart failure, no AI explanation
Participants in this arm will be shown standard AI model predictions during 3 patient clinical vignettes within the survey and systematically biased AI model predictions during 3 clinical vignettes.
When shown systematically biased AI model predictions, the model will be biased against heart failure, always predicting that heart failure is present with high likelihood in patients with a body mass index (BMI) at or above 30.
Standard predictions will be shown for the other 2 diagnoses.
Participants in this arm will not be shown an AI explanation when shown AI model predictions.
|
During 6 clinical vignettes, participants will see AI model predictions without a corresponding AI explanation.
The AI model will provide a score for each diagnosis (heart failure, pneumonia, COPD) on a scale of 0-100 estimating how likely the patient's presentation was due to each of these diagnoses.
In 3 of the clinical vignettes, participants will be shown standard AI model predictions and 3 vignettes they will be shown systematically biased AI model predictions, with the model specifically biased against one of the three diagnoses.
In 3 clinical vignettes, participants will be shown systematically biased AI model predictions with the model specifically biased against heart failure, always predicting that heart failure is present with high likelihood in survey vignette patients with a body mass index (BMI) at or above 30.
Standard predictions will be shown for the other 2 diagnoses (pneumonia, COPD).
|
|
Experimental: AI model biased for pneumonia, no AI explanation
Participants in this arm will be shown standard AI model predictions during 3 patient clinical vignettes within the survey and systematically biased AI model predictions during 3 clinical vignettes.
When shown systematically biased AI model predictions, the model will be biased against pneumonia, always predicting that pneumonia is present with high likelihood in patients 80 years or older.
Standard predictions will be shown for the other 2 diagnoses.
Participants in this arm will not be shown an AI explanation when shown AI model predictions.
|
During 6 clinical vignettes, participants will see AI model predictions without a corresponding AI explanation.
The AI model will provide a score for each diagnosis (heart failure, pneumonia, COPD) on a scale of 0-100 estimating how likely the patient's presentation was due to each of these diagnoses.
In 3 of the clinical vignettes, participants will be shown standard AI model predictions and 3 vignettes they will be shown systematically biased AI model predictions, with the model specifically biased against one of the three diagnoses.
In 3 clinical vignettes, participants will be shown systematically biased AI model predictions with the model specifically biased against pneumonia, always predicting that pneumonia is present with high likelihood in survey vignette patients 80 years or older.
Standard predictions will be shown for the other 2 diagnoses (heart failure, COPD).
|
|
Experimental: AI model biased for COPD, no AI explanation
Participants in this arm will be shown standard AI model predictions during 3 patient clinical vignettes within the survey and systematically biased AI model predictions during 3 clinical vignettes.
When shown systematically biased AI model predictions, the model will be biased against COPD, always predicting that COPD is present with high likelihood when a pre-processing filter was applied to the patient's X-ray.
Standard predictions will be shown for the other 2 diagnoses.
Participants in this arm will not be shown an AI explanation when shown AI model predictions.
|
During 6 clinical vignettes, participants will see AI model predictions without a corresponding AI explanation.
The AI model will provide a score for each diagnosis (heart failure, pneumonia, COPD) on a scale of 0-100 estimating how likely the patient's presentation was due to each of these diagnoses.
In 3 of the clinical vignettes, participants will be shown standard AI model predictions and 3 vignettes they will be shown systematically biased AI model predictions, with the model specifically biased against one of the three diagnoses.
In 3 clinical vignettes, participants will be shown systematically biased AI model predictions with the model specifically biased against COPD, always predicting that COPD is present with high likelihood in survey vignette patients where a pre-processing filter was applied to the patient's X-ray.
Standard predictions will be shown for the other 2 diagnoses (heart failure, pneumonia).
|
|
Experimental: AI model biased for heart failure, Image-based AI explanation presented
Participants in this arm will be shown standard AI model predictions during 3 patient clinical vignettes within the survey and systematically biased AI model predictions during 3 clinical vignettes.
When shown systematically biased AI model predictions, the model will be biased against heart failure, always predicting that heart failure is present with high likelihood in patients with a body mass index (BMI) at or above 30.
Standard predictions will be shown for the other 2 diagnoses.
Participants in this arm will also be shown AI explanation when shown AI model predictions.
|
In 3 clinical vignettes, participants will be shown systematically biased AI model predictions with the model specifically biased against heart failure, always predicting that heart failure is present with high likelihood in survey vignette patients with a body mass index (BMI) at or above 30.
Standard predictions will be shown for the other 2 diagnoses (pneumonia, COPD).
During 6 clinical vignettes, participants will see AI model predictions with explanation.
The AI model will provide a score for each diagnosis on a scale of 0-100.
In 3 clinical vignettes, participants will be shown standard AI model predictions and 3 vignettes they will be shown systematically biased AI model predictions with the model specifically biased against one of the three diagnoses.
If the AI model provides a score above 50 an AI model explanation will be shown as gradient-weighted class activation mapping (Grad-CAM) heatmaps overlaid on the chest X-ray that highlighted which regions of the image most affecting the AI model's prediction.
|
|
Experimental: AI model biased for pneumonia, Image-based AI explanation presented
Participants in this arm will be shown standard AI model predictions during 3 patient clinical vignettes within the survey and systematically biased AI model predictions during 3 clinical vignettes.
When shown systematically biased AI model predictions, the model will be biased against pneumonia, always predicting that pneumonia is present with high likelihood in patients 80 years or older.
Standard predictions will be shown for the other 2 diagnoses.
Participants in this arm will also be shown AI explanation when shown AI model predictions.
|
In 3 clinical vignettes, participants will be shown systematically biased AI model predictions with the model specifically biased against pneumonia, always predicting that pneumonia is present with high likelihood in survey vignette patients 80 years or older.
Standard predictions will be shown for the other 2 diagnoses (heart failure, COPD).
During 6 clinical vignettes, participants will see AI model predictions with explanation.
The AI model will provide a score for each diagnosis on a scale of 0-100.
In 3 clinical vignettes, participants will be shown standard AI model predictions and 3 vignettes they will be shown systematically biased AI model predictions with the model specifically biased against one of the three diagnoses.
If the AI model provides a score above 50 an AI model explanation will be shown as gradient-weighted class activation mapping (Grad-CAM) heatmaps overlaid on the chest X-ray that highlighted which regions of the image most affecting the AI model's prediction.
|
|
Experimental: AI model biased for COPD, Image-based AI explanation presented
Participants in this arm will be shown standard AI model predictions during 3 patient clinical vignettes within the survey and systematically biased AI model predictions during 3 clinical vignettes.
When shown systematically biased AI model predictions, the model will be biased against COPD, always predicting that COPD is present with high likelihood when a pre-processing filter was applied to the patient's X-ray.
Standard predictions will be shown for the other 2 diagnoses.
Participants in this arm will also be shown AI explanation when shown AI model predictions.
|
In 3 clinical vignettes, participants will be shown systematically biased AI model predictions with the model specifically biased against COPD, always predicting that COPD is present with high likelihood in survey vignette patients where a pre-processing filter was applied to the patient's X-ray.
Standard predictions will be shown for the other 2 diagnoses (heart failure, pneumonia).
During 6 clinical vignettes, participants will see AI model predictions with explanation.
The AI model will provide a score for each diagnosis on a scale of 0-100.
In 3 clinical vignettes, participants will be shown standard AI model predictions and 3 vignettes they will be shown systematically biased AI model predictions with the model specifically biased against one of the three diagnoses.
If the AI model provides a score above 50 an AI model explanation will be shown as gradient-weighted class activation mapping (Grad-CAM) heatmaps overlaid on the chest X-ray that highlighted which regions of the image most affecting the AI model's prediction.
|
What is the study measuring?
Primary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Participant diagnostic accuracy across clinical vignette settings
Time Frame: Day 0
|
Diagnostic accuracy is defined as the number of correct diagnostic assessments over the total number of diagnostic assessments.
After reviewing each individual patient clinical vignette within the survey, participants will be asked to make three separate diagnostic assessments for each clinical vignette, one for heart failure, pneumonia, and COPD.
If the participant's assessment agrees with the reference label for each vignette, the diagnostic assessment is considered correct.
Diagnostic assessments will be performed while participants are completing the survey (day 0), immediately after the participant reviews the clinical vignette.
Participant diagnostic accuracy will be compared across vignette settings (no AI model, standard AI model, standard AI model with explanation, biased AI model, biased AI model with explanation).
|
Day 0
|
Secondary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Treatment Selection Accuracy across clinical vignette settings
Time Frame: Day 0
|
Treatment selection accuracy is defined as whether the participant choose the correct treatment for the patient in the clinical vignette, and could choose any combination of steroids, antibiotics, Intravenous (IV) diuretics, or none of these treatments for the patient.
Treatment selection assessments will be performed while participants are completing the survey (day 0), immediately after the participant reviews the clinical vignette.
Participant treatment selection accuracy will be compared across vignette settings (no AI model, standard AI model, standard AI model with explanation, biased AI model, biased AI model with explanation).
|
Day 0
|
|
Diagnosis specific diagnostic accuracy across clinical vignette settings
Time Frame: Day 0
|
Diagnostic accuracy specific to heart failure, pneumonia, and COPD across vignette settings
|
Day 0
|
Collaborators and Investigators
Sponsor
Collaborators
Investigators
- Principal Investigator: Michael Sjoding, MD, University of Michigan
Study record dates
Study Major Dates
Study Start (Actual)
Primary Completion (Actual)
Study Completion (Actual)
Study Registration Dates
First Submitted
First Submitted That Met QC Criteria
First Posted (Actual)
Study Record Updates
Last Update Posted (Actual)
Last Update Submitted That Met QC Criteria
Last Verified
More Information
Terms related to this study
Additional Relevant MeSH Terms
Other Study ID Numbers
- HUM00180745
- R01HL158626 (U.S. NIH Grant/Contract)
Plan for Individual participant data (IPD)
Plan to Share Individual Participant Data (IPD)?
IPD Plan Description
IPD Sharing Time Frame
IPD Sharing Access Criteria
IPD Sharing Supporting Information Type
- STUDY_PROTOCOL
- SAP
Drug and device information, study documents
Studies a U.S. FDA-regulated drug product
Studies a U.S. FDA-regulated device product
This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.
Clinical Trials on Acute Respiratory Failure
-
Efficacy Care R&D LtdMemorial Hermann Hospital; CRG Medical, Inc.UnknownShock | Shock, Septic | Respiratory Failure | Respiratory Distress Syndrome | Shock, Cardiogenic | Acute Cardiac Failure | Acute Respiratory Failure | Acute Kidney Failure | Multi Organ Failure | Respiratory Arrest | Acute Respiratory Failure With Hypoxia | Acute Respiratory Failure Requiring Reintubation | Acute... and other conditionsUnited States
-
Hospices Civils de LyonRecruitingAcute Respiratory FailureFrance
-
Assistance Publique - Hôpitaux de ParisRecruitingAcute Respiratory FailureFrance
-
Laval UniversityRecruitingAcute Respiratory Failure | Hypoxemic Acute Respiratory Failure | High Flow Oxygen Therapy | Oxygen DeliveryCanada
-
Peking Union Medical College HospitalRecruitingAcute Respiratory Failure (ARF)China
-
Fondazione IRCCS Policlinico San Matteo di PaviaRecruitingAcute Respiratory Failure (ARF)Italy
-
Southeast University, ChinaNot yet recruitingAcute Respiratory Failure (ARF)
-
Dr. Behcet Uz Children's HospitalCompletedAcute Respiratory Failure | Acute Hypoxemic Respiratory Failure | Acute Hypoxemic and Hypercapnic Respiratory FailureTurkey
-
Fisher and Paykel HealthcareCentre hospitalier de l'Université de Montréal (CHUM); Institut universitaire...RecruitingAcute Hypoxemic Respiratory Failure | Acute Hypercapnic Respiratory FailureCanada
-
Siriraj HospitalCompletedAcute Hypoxemic Respiratory Failure | Acute Hypercapnic Respiratory FailureThailand
Clinical Trials on Artificial Intelligence model predictions without explanation
-
Zhejiang Provincial People's HospitalThe Affiliated Hospital of Qingdao University; Women's Hospital School Of Medicine... and other collaboratorsNot yet recruitingOvarian Neoplasms | Adnexal Mass
-
Huashan HospitalRecruiting
-
Cairo UniversityFuture University in EgyptRecruitingArtificial Intelligence | Open BiteEgypt
-
Fondazione IRCCS Istituto Nazionale dei Tumori,...University of Milano BicoccaActive, not recruitingLung Cancer | Blood BiomarkersItaly
-
Fujian Maternity and Child Health HospitalCompletedArtificial Intelligence | Cervical Cancer Screening | Machine Learning | Risk AssessmentChina
-
Sun Yat-Sen Memorial Hospital of Sun Yat-Sen UniversityCompleted
-
Sun Yat-Sen Memorial Hospital of Sun Yat-Sen UniversityCompletedProstatic Neoplasms | Lymphatic MetastasisChina
-
Sun Yat-Sen Memorial Hospital of Sun Yat-Sen UniversityRecruitingCancer | Lymphatic MetastasisChina
-
Sun Yat-Sen Memorial Hospital of Sun Yat-Sen UniversityNot yet recruitingProstatic Neoplasms, Castration-ResistantChina
-
Second Affiliated Hospital, School of Medicine,...Not yet recruitingSeptic Shock | Artificial Intelligence | Intensive Care Unit Psychosis