Scalable Clinical Oversight of Large Language Models Via Uncertainty Triangulation (SCOUT)

Prospective Evaluation of a Model-Agnostic Meta-Verification Framework (SCOUT) for Scalable Clinical Oversight of Large Language Model Outputs in Coronary Heart Disease Diagnosis: A Multi-Reader, Randomized, Crossover Trial

This prospective, multi-reader, randomized crossover trial evaluates SCOUT (Scalable Clinical Oversight via Uncertainty Triangulation), a model-agnostic meta-verification framework that selectively defers unreliable large language model (LLM) predictions to clinicians by triangulating three orthogonal uncertainty signals: model heterogeneity, stochastic inconsistency, and reasoning critique. The trial assesses whether SCOUT-assisted review can reduce physician review time compared with standard manual review of AI-generated diagnoses while maintaining non-inferior diagnostic accuracy in coronary heart disease (CHD) subtyping.

Study Overview

Detailed Description

Background: Large language models are increasingly deployed in clinical workflows, yet requiring clinician review of every AI output negates the efficiency gains that motivate their adoption. SCOUT addresses this efficiency-safety paradox through algorithmic meta-verification.

The SCOUT framework triangulates three orthogonal external signals to determine case-level uncertainty: (1) Model Heterogeneity - whether a structurally different auxiliary LLM agrees with the primary model; (2) Stochastic Inconsistency - whether repeated sampling from the same model yields divergent outputs; (3) Reasoning Critique - whether an external checker model identifies logical flaws in the chain-of-thought reasoning.

In this crossover trial, 7 clinicians of varying seniority (2 junior residents, 3 senior residents, 2 attending physicians) each review all 110 cases under both standard manual review and SCOUT-assisted review workflows. The study evaluates workflow efficiency (primary endpoint) and diagnostic accuracy (secondary endpoint).

Study Type

Interventional

Enrollment (Estimated)

7

Phase

  • Not Applicable

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Contact

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult
  • Older Adult

Accepts Healthy Volunteers

No

Description

Inclusion Criteria:

  • Board-certified or in-training cardiologists at Fuwai Hospital
  • Spanning three experience strata: junior residents, senior residents, attending physicians

Exclusion Criteria:

  • Clinicians involved in the development or optimization of the SCOUT framework
  • Clinicians involved in the gold-standard adjudication process

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

  • Primary Purpose: Diagnostic
  • Allocation: Randomized
  • Interventional Model: Crossover Assignment
  • Masking: None (Open Label)

Arms and Interventions

Participant Group / Arm
Intervention / Treatment
Active Comparator: Control (Standard Manual Review)
Physicians manually review all cases in the control set (n=54) with access to AI predictions and reasoning. No selective deferral.
Physicians perform a full manual review of 54 cases using raw medical records with access to the AI model's predictions and reasoning, but without SCOUT uncertainty stratification or selective deferral.
Experimental: Experimental (SCOUT-Assisted Review)
Physicians process the intervention set (n=56) through the SCOUT framework. Low-uncertainty cases are auto-accepted; high-uncertainty cases undergo physician review with full audit trail.
SCOUT-Assisted Review (Intervention Arm): Physicians review 56 cases processed through the SCOUT framework. For cases classified as low-uncertainty (D(x)=0), the AI prediction is auto-accepted without physician review. For high-uncertainty cases (D(x)=1), the physician reviews the case with access to the main model's chain-of-thought reasoning and the meta-verification audit results. The main model is DeepSeek-V3.1 with chain-of-thought prompting.

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Mean physician review time per case (minutes)
Time Frame: Through study completion, an average of 2 hours.
Mean time spent by each clinician reviewing and rendering a diagnostic decision per case under each arm. Measured in minutes.
Through study completion, an average of 2 hours.

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Diagnostic accuracy (%)
Time Frame: Through study completion, an average of 2 hours.
Proportion of correct CHD subtype classifications (STEMI, NSTEMI, unstable angina, chronic coronary syndromes) under each arm.
Through study completion, an average of 2 hours.
Computational Return on Investment (ROI)
Time Frame: Through study completion, an average of 2 hours.
Ratio of physician time savings (valued at standardized minute-wages from Sanming healthcare reform benchmarks) to computational cost of SCOUT inference, stratified by clinician seniority level.
Through study completion, an average of 2 hours.

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Estimated)

February 19, 2026

Primary Completion (Estimated)

February 28, 2026

Study Completion (Estimated)

February 28, 2026

Study Registration Dates

First Submitted

February 9, 2026

First Submitted That Met QC Criteria

February 14, 2026

First Posted (Actual)

February 17, 2026

Study Record Updates

Last Update Posted (Actual)

February 17, 2026

Last Update Submitted That Met QC Criteria

February 14, 2026

Last Verified

February 1, 2026

More Information

Terms related to this study

Other Study ID Numbers

  • 2025-2702-1

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

YES

IPD Plan Description

De-identified individual participant data underlying the results reported in this study will be made available.

IPD Sharing Time Frame

Beginning 1 months after publication of the primary results and available for up to 60 months.

IPD Sharing Access Criteria

Data are available from the corresponding author upon reasonable request. Requestors will need to provide a methodologically sound research proposal and sign a data use agreement.

IPD Sharing Supporting Information Type

  • STUDY_PROTOCOL
  • SAP
  • ICF
  • ANALYTIC_CODE
  • CSR

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Coronary Heart Disease (CHD)

Clinical Trials on Standard Manual Review Workflow

Subscribe