Project 3 Example: Human-AI Collaboration Tester (HAICT) Exp. 7

July 25, 2023 updated by: Jeremy M Wolfe, PhD, Brigham and Women's Hospital

The study is one part of a "bundle" of experiments that constitute Project Three of a National Eye Institute grant. Project Three includes a series of experiments that investigate how changing the input from a simulated AI can affect the decisions made by human observers in a two-alternative forced choice task (like the decision to recall a woman for further examination in mammography). HAICT 7, the experiment described here, investigates how changing prevalence affects human performance when AI is used as a Second Reader.

Study Overview

Status

Recruiting

Conditions

Intervention / Treatment

Detailed Description

This text is the text of the pre-registration for the HAICT 7 experiment as described on the Open Science Framework. https://osf.io/hngu4/

NOTE: This study is representative of studies conducted in Project 3 of this grant. There are multiple experiments in the bundle of experiments represented by Project 3 but it is not possible to register a bundle of studies on CT.gov.

NOTE: Since the pronoun comment is advisory, we will leave it for now.

Human-AI Collaboration Tester (HAICT) Exp. 7 (lightly edited from OSF)

Data collection. Have any data been collected for this study already? (Yes/No)
yes
Hypothesis. What's the main question being asked or hypothesis being tested in this study?

Background: In a variety of search experiments, both basic and clinical, the data have been consistent with a situation where the variability of the signal (or target) is greater than the variability of the noise (distractors). The classic sign of this is a zROC function with a slope < 1 - typically around 0.6. A slope of 1.0 is indicative of an equal variance 2AFC task. For the HAICT task that we have been testing, we would expect equal variance, but we think it would be worth checking so we will systematically vary prevalence which will shift criterion. That will sweep out an ROC curve that we can examine.

We will also test the Second Reader faux-AI in order to determine if low prevalence makes Second Reader worse.

(H1): We expect to replicate the finding that human criteria become more conservative as prevalence declines.
(H2): We predict that the slope of the resulting zROC will be 1.0.
(H3): We hypothesize that low prevalence will make Second Reader AI less effective because the positive predictive value of its comments will be low.
1. Dependent variable. Describe the key dependent variable(s) specifying how they will be measured.
  The main dependent variables of interest are accuracy (and the signal detection derivatives of accuracy, d' and c), reaction time, and subjective ratings on the survey following each block.
2. Conditions. How many and which conditions will participants be assigned to?

This series of experiments investigates how changing the input from a simulated AI can affect the decisions made by human observers in a two-alternative forced choice task (like the decision to recall a woman for further examination in mammography). We have developed a paradigm called the Human-AI Collaboration Tester (HAICT) that allows for efficient testing of interactions between a human and a simulated AI.

The observers' task in all conditions is to give a 2AFC decision about whether a stimulus is "bad" or "not bad." To use language roughly mimicking a medical diagnosis, each stimulus is referred to as a "case." Observers are asked to make a 2AFC decision about arrays of colored shapes. The decision is made based on the predominant color of the case. The number of elements of each color are drawn from one of two normal distributions, one for positive (bad) stimuli and the other for negative (not bad) stimuli.

The results from previous HAICT experiments (3 and 4) showed that human performance in the Second Reader condition drops off significantly at low prevalence. Performance in the Second Reader condition was better than Baseline when the prevalence of bad cases was 50% but was significantly worse than Baseline when prevalence was only 10%. In this experiment, we manipulate the prevalence of "bad" cases in the Second Reader and Baseline conditions. Four different prevalence rates will be tested - 10%, 33%, 67%, and 90%. Observers will complete 8 blocks (2 AI rules x 4 prevalence rates), and block order is random.

AI rules to be tested:

Baseline - No AI input. Observer classifies each case as "bad" or "not" bad on their own.
Second Reader - The observer makes an initial decision about every case. The AI silently classifies stimuli using a conservative criterion (c = 0.5). The logic for the conservative criterion is that the second reader is being used to cut down on false positive responses and so it is intended to question positive human responses that might be marginal. If the observer and AI disagree, then the AI informs the human observer. The observer is then given a chance to either change their response or go with their first opinion.
As in Experiments 1-5, the AI d-prime is fixed at 2.2. Feedback is known to increase the prevalence effect, so feedback will be given in both the practice and the test trials. Observers will complete 20 practice trials and 200 test trials in each block. Immediately after each block is completed, observers will be shown a summary of their performance. After the Second Reader blocks, they will also be asked to answer three subjective questions about the usefulness of the AI (see "Files" for more details).
Analyses. Specify exactly which analyses you will conduct to examine the main question/hypothesis.
First, we summarize the number of hits, true negatives, misses, and false alarms in each block. From this, we can calculate the accuracy, the positive predictive value, sensitivity (d-prime), and the criterion for each observer under each of the different conditions. Given measures of performance at 4 levels of prevalence, we can estimate the ROC curve (pHit x pFA) and the zROC function (zHit x zFA). We will test the hypothesis that the slope of the zROC is equal to 1 (the consequence of an equal variance 2AFC task).
More analyses. Any secondary analyses?
We will look to see if the observers' subjective opinions about the AI are correlated with variables such as the empirical d-prime, or the positive predictive value.
Sample size. How many observations will be collected or what will determine sample size? No need to justify decision, but be precise about exactly how the number will be determined.
We will test 12 observers. This is consistent with the sample sizes of previous experiments.
Other. Is there anything else that you would like to pre-register? (e.g., data exclusions, variables collected for exploratory purposes, unusual analyses planned?)

N/A

Study Type

Interventional

Enrollment (Estimated)

Phase

Not Applicable

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Contact

Name: Jeremy M Wolfe, PhD
Phone Number: 6178511166
Email: jwolfe@bwh.harvard.edu

Study Locations

United States
- Massachusetts
  - Boston, Massachusetts, United States, 02215
    - Recruiting
    - Visual Attention Lab / Brigham and Women's Hospital
    - Contact:
      
      Jeremy M Wolfe
      
      Phone Number: 617-851-1166
      
      Email: jwolfe@bwh.harvard.edu

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

18 years and older (Adult, Older Adult)

Accepts Healthy Volunteers

Yes

Description

Inclusion Criteria:

- All welcome to enroll on line

Exclusion Criteria:

Must pass the Ishihara color vision screening test
20/25 vision (with correction)

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Primary Purpose: Basic Science
Allocation: N/A
Interventional Model: Single Group Assignment
Masking: None (Open Label)

Number of Arms

Arms and Interventions

Participant Group / Arm	Intervention / Treatment
Experimental: Experiment All participants are tested in all conditions of this experiment.	Behavioral: Simulated Second Reader AI In this experiment, in some conditions, the participant makes their decision in the presence of information about a simulated artificial intelligence decision. Behavioral: Target Prevalence The frequency with which targets are presented varies from 10% to 90% Other Names: Base Rate

What is the study measuring?

Primary Outcome Measures

Outcome Measure	Measure Description	Time Frame
D' Time Frame: Up to one week	D' (d-prime) is the signal detection theory measure of the level of performance on a task.	Up to one week
Criterion Time Frame: Up to one week	Criterion is the signal detection theory measure of the bias ("liberal" or "conservative") of observers' decisions	Up to one week

Secondary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Reaction Time Time Frame: Up to one week	This is the measure of how long it takes to make a response.	Up to one week

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Sponsor

Brigham and Women's Hospital

Investigators

Principal Investigator: Jeremy M Wolfe, PhD, Brigham and Women's Hospital

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

January 1, 2020

Primary Completion (Estimated)

August 1, 2024

Study Completion (Estimated)

January 1, 2025

Study Registration Dates

First Submitted

February 18, 2022

First Submitted That Met QC Criteria

February 28, 2022

First Posted (Actual)

March 9, 2022

Study Record Updates

Last Update Posted (Actual)

July 27, 2023

Last Update Submitted That Met QC Criteria

July 25, 2023

Last Verified

July 1, 2023

More Information

Terms related to this study

Keywords

Additional Relevant MeSH Terms

Other Study ID Numbers

2007P000646-B

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

YES

IPD Plan Description

De-identified raw data will be posted on the experiment's OSF page and will also be available on request to the PI.

IPD Sharing Time Frame

Materials will be available when requested

IPD Sharing Access Criteria

essentially unrestricted

IPD Sharing Supporting Information Type

STUDY_PROTOCOL
SAP
ICF

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Decision Making

Brigham and Women's Hospital

Completed

PROJECT 2 EXAMPLE: Feedback X Prevalence Using Dermatology Stimuli

Decision Making

United States
University of Zurich
Pfizer

Completed

Effects of PF-06412562 on Value-based Decision-making in Healthy Individuals

Decision Making

Switzerland
University Hospital, Clermont-Ferrand

Completed

Assessment of Factors Involved in the Decision-making for ICU Patients' Care

Decision Making

France
Umraniye Education and Research Hospital

Completed

Choosing the Technique for First Abdominal Entry in Laparoscopy

Decision-Making

Turkey
University of Utah
National Institutes of Health (NIH)

Completed

Evaluation of Drug Display in Critical Care Setting

Decision Making

United States
Massachusetts General Hospital

Unknown

A Prospective Randomized Trial Using Video Images in Advance Care Planning in Seriously Ill Hospitalized Patients

Decision Making | Video Decision Aids

United States
University Hospital, Clermont-Ferrand

Not yet recruiting

DECIdE, Shared DECIsion in hEalth : the Real-life Impact Study (DECIdE)

Decision Making, Shared

France
University of Zurich

Recruiting

Studying the Role of Brain Molecules for Decision Making

Healthy | Decision Making

Switzerland
University of Colorado, Denver
Agency for Healthcare Research and Quality (AHRQ)

Active, not recruiting

SHARE Approach Evaluation

Shared Decision Making

United States
University of Colorado, Denver
National Institute on Aging (NIA)

Completed

Underutilization of Hospice in Older African Americans

Hospice Decision Making

United States

Clinical Trials on Simulated Second Reader AI

First Affiliated Hospital Xi'an Jiaotong University

Recruiting

Precise Therapy for Refractory HER2 Positive Advanced Breast Cancer

HER2+ Breast Cancer

China
National Cancer Institute (NCI)

Active, not recruiting

Talimogene Laherparepvec, Chemotherapy, and Radiation Therapy Before Surgery in Treating Patients With Locally Advanced or Metastatic Rectal Cancer

Metastatic Rectal Adenocarcinoma | Rectal Adenocarcinoma | Stage III Rectal Cancer AJCC v7 | Stage IIIA Rectal Cancer AJCC v7 | Stage IIIB Rectal Cancer AJCC v7 | Stage IIIC Rectal Cancer AJCC v7 | Stage IV Rectal Cancer AJCC v7 | Stage IVA Rectal Cancer AJCC v7 | Stage IVB Rectal Cancer AJCC v7 | Locally...

United States
OHSU Knight Cancer Institute
Oregon Health and Science University

Recruiting

NeoOPTIMIZE: Early Switching of mFOLFIRINOX or Gemcitabine/Nab-Paclitaxel Before Surgery for the Treatment of Resectable, Borderline Resectable, or Locally-Advanced Unresectable Pancreatic Cancer

Pancreatic Adenocarcinoma | Stage III Pancreatic Cancer AJCC v8 | Stage IV Pancreatic Cancer AJCC v8 | Stage 0 Pancreatic Cancer AJCC v8 | Stage I Pancreatic Cancer AJCC v8

United States
Mayo Clinic
National Cancer Institute (NCI)

Completed

Pembrolizumab, Combination Chemotherapy, and Radiation Therapy Before Surgery in Treating Adult Patients With Locally Advanced Gastroesophageal Junction or Gastric Cardia Cancer That Can Be Removed by Surgery

Gastroesophageal Junction Adenocarcinoma | Gastric Cardia Adenocarcinoma | Stage IB Gastric Cancer AJCC v7 | Stage II Gastric Cancer AJCC v7 | Stage IIA Gastric Cancer AJCC v7 | Stage IIB Gastric Cancer AJCC v7 | Stage IIIA Gastric Cancer AJCC v7 | Stage IIIB Gastric Cancer AJCC v7

United States
National Cancer Institute (NCI)

Completed

Bevacizumab, Radiation Therapy, and Combination Chemotherapy in Treating Patients Who Are Undergoing Surgery for Locally Advanced Nonmetastatic Rectal Cancer

Rectal Adenocarcinoma | Stage III Rectal Cancer AJCC v7 | Stage II Rectal Cancer AJCC v7

United States
National Cancer Institute (NCI)

Active, not recruiting

Oxaliplatin, Leucovorin Calcium, and Fluorouracil With or Without Bevacizumab in Treating Patients Who Have Undergone Surgery for Stage II Colon Cancer

Lynch Syndrome | Colon Mucinous Adenocarcinoma | Colon Signet Ring Cell Adenocarcinoma | Stage IIA Colon Cancer AJCC v7 | Stage IIB Colon Cancer AJCC v7 | Stage IIC Colon Cancer AJCC v7

United States, Canada, Puerto Rico, Peru, South Africa

Project 3 Example: Human-AI Collaboration Tester (HAICT) Exp. 7

Study Overview

Status

Conditions

Intervention / Treatment

Detailed Description

Study Type

Enrollment (Estimated)

Phase

Contacts and Locations

Study Contact

Study Locations

Participation Criteria

Eligibility Criteria

Ages Eligible for Study

Accepts Healthy Volunteers

Description

Study Plan

How is the study designed?

Design Details

Number of Arms

Arms and Interventions

Participant Group / Arm

Intervention / Treatment

What is the study measuring?

Primary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Secondary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Collaborators and Investigators

Sponsor

Investigators

Study record dates

Study Major Dates

Study Start (Actual)

Primary Completion (Estimated)

Study Completion (Estimated)

Study Registration Dates

First Submitted

First Submitted That Met QC Criteria

First Posted (Actual)

Study Record Updates

Last Update Posted (Actual)

Last Update Submitted That Met QC Criteria

Last Verified

More Information

Terms related to this study

Keywords

Additional Relevant MeSH Terms

Other Study ID Numbers

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

IPD Plan Description

IPD Sharing Time Frame

IPD Sharing Access Criteria

IPD Sharing Supporting Information Type

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

Clinical Trials on Decision Making

Clinical Trials on Simulated Second Reader AI

Search Similar Trials

Sponsors and Collaborators

Medical Conditions

Drug Interventions

CROs by country

CROs in United Arab Emirates

Conditions

Rare Diseases

Drug Interventions

Dietary Supplements

Sponsor/Collaborators

Locations