Denne side blev automatisk oversat, og nøjagtigheden af oversættelsen er ikke garanteret. Der henvises til engelsk version for en kildetekst.

ANCHOR Validation Trial in High-Risk Multidisciplinary Care

6. juli 2026 opdateret af: Sanjay Basu, Waymark

ANCHOR (Auditable Navigation of Clinical Hazards With Oversight and Reasoning) Multicenter Randomized Validation Study: A Pragmatic Three-Arm (1:1:1) Patient-Level Randomized Controlled Trial of a Structural Verification Layer for AI-Assisted High-Risk Multidisciplinary Care Across Three U.S. States

This pre-registered, pragmatic, three-arm (1:1:1) patient-level randomized controlled trial with mixed-effects analysis at the encounter level tests two questions in real high-risk multidisciplinary clinical encounters at the Waymark clinically integrated network across three U.S. states (Ohio, Washington, Virginia): (1) does adding ANCHOR - a clinical AI structural verification layer - to a Gemini 3.1 Pro-assisted supervising-physician workflow reduce the rate of clinically meaningful safety failures, compared with the same Gemini 3.1 Pro-assisted workflow without ANCHOR? (2) does the Gemini 3.1 Pro-assisted workflow itself reduce the same safety endpoint compared with unassisted standard care in which the supervising physician writes their own SOAP assessment/plan from a blank template?

ANCHOR is a single-call structural verification layer combining a Logical Neural Network (Riegel et al. 2020) certificate, six specialist agents, and concept-decomposed output with PMID citation provenance. ANCHOR is physician-facing only and is used by supervising physicians, not by the multidisciplinary clinical team they oversee.

The trial randomizes 240 patients 1:1:1 across the Waymark clinically integrated network over a 12-week active-enrolment window (80 per arm). Eligible patients are adults (age 18+) identified as high-risk by combined claims-based and clinical criteria. Eligible encounters span three integrated Waymark service modalities: high-risk primary care, specialty care coordination, and real-time telemedicine urgent care. The primary endpoint is a per-encounter binary composite: any of (a) failure to mention a do-not-miss diagnosis, (b) under-triage, (c) contraindicated medication recommendation, (d) failure to recommend escalation when clinically warranted; adjudicated by a blinded panel of 3 board-certified physicians with majority-of-three scoring. The primary contrast is Arm 3 (LLM+ANCHOR) versus Arm 2 (LLM with safety prompt), isolating ANCHOR's marginal contribution over a deployment-equivalent LLM safety stack. The pre-specified secondary contrast is Arm 2 versus Arm 1.

The trial is sized to the operational ceiling of the Waymark integrated-network workflow across the three states (240 enrollees over 12 weeks). At realistic effect sizes derived from the retrospective evaluation, the trial is underpowered for definitive efficacy declaration on either pairwise contrast and is reported as an initial deployment-feasibility validation cohort with effect estimates and 95 percent confidence intervals; full power calculations are pre-registered in the Statistical Analysis Plan.

Single-blind outcome adjudication: 3 adjudicators score only the supervising physician's final clinical decision, so all three arms produce adjudication packets in identical format and arm allocation is structurally invisible. Statisticians remain blinded until database lock. A full waiver of informed consent is requested per 45 CFR 46.116(f)(3) with a companion HIPAA waiver of authorization under 45 CFR 164.512(i)(2)(ii). The study is registered on the Open Science Framework prior to first enrollment and reported under CONSORT-AI 2020.

Studieoversigt

Status

Afsluttet

Betingelser

Intervention / Behandling

Detaljeret beskrivelse

DESIGN. Pragmatic, multicenter, encounter-level, three-arm parallel randomized controlled trial. 1:1:1 patient-level stratified permuted-block randomization, block size 6, stratified by site and acuity stratum. Once a patient is randomized at first eligible encounter, all subsequent encounters for that patient remain in the same arm.

ARMS.

Arm 1 (Unassisted standard care, n=80): No LLM. No ANCHOR. The supervising physician opens a blank SOAP note template and writes their own assessment and plan from scratch.
Arm 2 (Gemini 3.1 Pro with safety prompt, n=80): Gemini 3.1 Pro generates the recommendation under a clinical-safety system prompt, content filters, and retrieval-augmented generation. The supervising physician reviews the LLM output directly. This stack is operationally equivalent to LLM-assisted clinical-decision-support deployments already in routine use at major U.S. health systems.
Arm 3 (Gemini 3.1 Pro + ANCHOR, n=80): Gemini 3.1 Pro generates the recommendation under the same safety prompt as Arm 2; ANCHOR augments that output with the Logical Neural Network certificate, specialist-agent verification, and concept-decomposed audit trail.

OPERATIONAL SIZING. The Waymark integrated-network workflow across Ohio, Washington, and Virginia captures approximately 20 eligible high-risk multidisciplinary encounters per week. With 12 weeks of active enrolment, total enrolment is 240 encounters (80 per arm at 1:1:1).

POWER CALCULATION. Anchored on the architectural-argument retrospective cohort. For each pairwise contrast at n=80 per arm, the plausible effect range is 5 to 11 percentage points absolute reduction (midpoint 8 percentage points). At the midpoint planning effect (Arm 2 event rate 25 percent versus Arm 3 17 percent), power is approximately 0.50 for the verification-layer-specific contrast (Arm 3 versus Arm 2) at alpha = 0.05. The trial functions as a calibration cohort with effect estimates and 95 percent confidence intervals.

PRIMARY ENDPOINT. Per-encounter binary composite, adjudicated by 3 board-certified physicians blinded to allocation; final scoring by majority of three. Components: (a) Failure to mention a do-not-miss diagnosis; (b) Under-triage; (c) Recommendation of a medication contraindicated by documented allergies/conditions/comorbidities; (d) Failure to recommend escalation when clinically warranted.

PRIMARY CONTRAST: Arm 3 versus Arm 2 (ANCHOR-specific). PRE-SPECIFIED SECONDARY CONTRAST: Arm 2 versus Arm 1 (AI-introduction).

SECONDARY ENDPOINTS (hierarchical fixed sequence, Holm-Bonferroni step-down at family-wise alpha = 0.05): (1) Appropriate triage escalation rate; (2) Time-to-physician-decision; (3) Clinician acceptance rate of ANCHOR flags (Arm 3 only); (4) 30-day emergency-department visit rate (EXPLORATORY); (5) 30-day hospitalization rate (EXPLORATORY); (6) Social-determinants-of-health response composite.

STATISTICAL ANALYSIS. Mixed-effects logistic regression with arm (3-level categorical) as primary fixed-effect predictor and random intercept for patient nested within supervising physician; fixed-effect adjustment for site and acuity stratum. Gatekeeping for the two pairwise contrasts. Pre-specified sensitivity analyses: cluster-robust standard errors at patient level; generalized estimating equations with patient cluster and exchangeable correlation; patient-level aggregated analysis; Bayesian logistic regression with informative prior on the ANCHOR-specific contrast from the retrospective evaluation; exact methods. Missing data via multiple imputation by chained equations (m=20). Intention-to-treat primary with treatment-policy estimand (per ICH E9(R1)); per-protocol sensitivity excluding API-failure encounters. No interim efficacy stop. Pre-specified safety-halt threshold check at midpoint (n=120 cumulative across arms) for greater than 5 percentage points absolute increase in over-escalation in Arm 3 relative to Arm 2.

ANCHOR DIAGNOSTIC-TEST CHARACTERIZATION (independent of trial primary endpoint). ANCHOR's certificate (operational state: hazard flagged / no hazard flagged / out of scope) is characterized as a diagnostic test against adjudicated ground truth: sensitivity, specificity, positive predictive value, negative predictive value, area under the receiver-operating-characteristic curve, and calibration metrics (Expected Calibration Error, Brier score, reliability diagram).

PRE-SPECIFIED ALGORITHMIC FAIRNESS AUDIT. Equalized odds gap, equal opportunity gap, predictive parity gap, and within-subgroup calibration (Brier-difference tolerance 0.05) across race/ethnicity, sex assigned at birth, age band, urban/rural per RUCA 2024, and primary language. Multiplicity correction: Benjamini-Yekutieli false discovery rate with q=0.05. Disparity decision rule (locked pre-unblinding): any pairwise equalized-odds gap exceeding 0.10 with the lower bootstrap 95 percent confidence interval bound greater than 0.05 triggers a Limitations section, mitigation proposal, and a hold on broader deployment until mitigation is validated.

DATA CAPTURE. The 30-day endpoints use Waymark's integrated electronic-health-record + real-time admission/discharge/transfer feed + claims pipeline. ADT events are detected in near-real time across the full network of acute and ambulatory facilities; claims are reconciled at day 30 to confirm capture completeness. Pre-trial registration locked at the Open Science Framework prior to first enrolment; reported under CONSORT-AI 2020.

Undersøgelsestype

Interventionel

Tilmelding (Faktiske)

240

Fase

Ikke anvendelig

Kontakter og lokationer

Dette afsnit indeholder kontaktoplysninger for dem, der udfører undersøgelsen, og oplysninger om, hvor denne undersøgelse udføres.

Studiesteder

Forenede Stater
- California
  - San Francisco, California, Forenede Stater, 94115
    - Waymark

Deltagelseskriterier

Forskere leder efter personer, der passer til en bestemt beskrivelse, kaldet berettigelseskriterier. Nogle eksempler på disse kriterier er en persons generelle helbredstilstand eller tidligere behandlinger.

Berettigelseskriterier

Aldre berettiget til at studere

Voksen
Ældre voksen

Tager imod sunde frivillige

Ingen

Beskrivelse

INCLUSION CRITERIA:

Age 18 years or older.
Attributed to a participating Waymark provider (academic medical center, community-hospital network, federally qualified health center, or independent physician practice in Ohio, Washington, or Virginia; full TIN-consolidated list deposited at the Open Science Framework).
Meets high-risk multidisciplinary criteria (combined claims-based and clinical: 2 or more emergency-department visits or 1 or more hospitalization in the prior 12 months, 5 or more active medications, 2 or more active specialist relationships, 2 or more chronic conditions, or claims-based equivalents).
Encounter occurs in one of the three Waymark service modalities: high-risk primary care, specialty care coordination, or real-time telemedicine urgent care.
English-language clinical documentation.
Encounter requires clinical reasoning (not administrative-only).

EXCLUSION CRITERIA:

Pediatric (age less than 18 years).
Hospice or palliative-care-exclusive care plan.
Active psychiatric crisis routed to crisis line.
Encounter is administrative only.
Pharmacy-only encounter that does not surface a clinical decision to the supervising physician.
Encounter where the supervising physician is the principal investigator.
Patient enrolled in a competing AI-safety study within the prior 90 days.

Studieplan

Dette afsnit indeholder detaljer om studieplanen, herunder hvordan undersøgelsen er designet, og hvad undersøgelsen måler.

Hvordan er undersøgelsen tilrettelagt?

Design detaljer

Primært formål: Sundhedstjenesteforskning
Tildeling: Randomiseret
Interventionel model: Parallel tildeling
Maskning: Enkelt

Antal våben

Våben og indgreb

Deltagergruppe / Arm	Intervention / Behandling
Ingen indgriben: Arm 1 - Unassisted standard care (control) n=80. No LLM. No ANCHOR. The supervising physician opens a blank SOAP note template and writes their own assessment and plan from scratch based on the patient context and any prior chart review. Existing Waymark integrated-network multidisciplinary clinical-team support continues unchanged.
Aktiv komparator: Arm 2 - Gemini 3.1 Pro with safety prompt (active comparator) n=80. Gemini 3.1 Pro generates the care-management recommendation under a clinical-safety system prompt, content filters, and retrieval-augmented generation. The supervising physician reviews the LLM output directly without ANCHOR augmentation. This stack is operationally equivalent to LLM-assisted clinical-decision-support deployments already in routine use at major U.S. health systems. Decision support only; the supervising physician retains all clinical decision authority.	Adfærdsmæssigt: Gemini 3.1 Pro with Safety Prompt Gemini 3.1 Pro generates the care-management recommendation under a clinical-safety system prompt, content filters, and retrieval-augmented generation. Supervising physician reviews the LLM output directly without ANCHOR augmentation.
Eksperimentel: Arm 3 - Gemini 3.1 Pro + ANCHOR (intervention) n=80. Same Gemini 3.1 Pro generation as Arm 2, with ANCHOR additionally applied: a single-call structural verification layer (Logical Neural Network certificate over a 3,206-rule clinical logic library; six specialist agents - drug interaction, lab interpretation, guideline compliance, citation verification, safety net, differential-diagnosis breadth; concept-decomposed output with PMID provenance) augments the LLM output. Supervising physician reviews the ANCHOR-augmented output. Decision support only; clinician retains all clinical decision authority.	Adfærdsmæssigt: ANCHOR Clinical AI Verification Layer (with Gemini 3.1 Pro) Same Gemini 3.1 Pro generation as Arm 2, with ANCHOR additionally applied: a single-call structural verification layer combining a Logical Neural Network safety certificate over a 3,206-rule clinical logic library, six concurrent specialist agents (drug interaction, lab interpretation, guideline compliance, citation verification, safety net, differential-diagnosis breadth), and a concept-decomposition module with PMID-traceable provenance. Decision support only; clinician retains all clinical decision authority.

Hvad måler undersøgelsen?

Primære resultatmål

Resultatmål	Foranstaltningsbeskrivelse	Tidsramme
Per-encounter clinical safety failure (adjudicated binary composite) Tidsramme: At the encounter (encounter-level outcome adjudicated within 4 weeks post-encounter)	Adjudicated binary composite of any of: (a) failure to mention a do-not-miss diagnosis appropriate for the presentation; (b) under-triage (routine/semi-urgent when emergent/urgent appropriate); (c) recommendation of a medication contraindicated by documented allergies/conditions/comorbidities; (d) failure to recommend escalation when clinically warranted. Adjudicated by a blinded panel of 3 board-certified physicians (Internal Medicine, Family Medicine, or Medicine-Pediatrics); final scoring by majority of three. Reported as proportion of encounters with composite safety failure.	At the encounter (encounter-level outcome adjudicated within 4 weeks post-encounter)

Sekundære resultatmål

Resultatmål	Foranstaltningsbeskrivelse	Tidsramme
Appropriate triage escalation rate Tidsramme: At the encounter	Proportion of encounters in which the assigned triage level was appropriate to the clinical presentation, adjudicated.	At the encounter
Time-to-physician-decision Tidsramme: Within the encounter (real-time)	Continuous duration in seconds from encounter start to supervising-physician final decision; log-transformed for analysis. Excludes API latency.	Within the encounter (real-time)
30-day emergency-department visit rate (exploratory) Tidsramme: 30 days post-randomization	Any emergency-department visit within 30 days post-encounter, captured through Waymark's integrated electronic-health-record + real-time admission/discharge/transfer feed + claims pipeline. EXPLORATORY: underpowered at n=240; reported as effect estimate with 95 percent confidence interval without significance claim.	30 days post-randomization
30-day hospitalization rate (exploratory) Tidsramme: 30 days post-randomization	Any hospitalization within 30 days post-encounter, captured through the integrated EHR + ADT + claims pipeline. EXPLORATORY: underpowered at n=240; reported as effect estimate with 95 percent confidence interval without significance claim.	30 days post-randomization

Andre resultatmål

Resultatmål	Foranstaltningsbeskrivelse	Tidsramme
Inappropriate over-escalation rate (safety) Tidsramme: At the encounter	Adjudicated binary indicator of inappropriate over-escalation. Pre-specified safety-halt threshold check at midpoint (n=120 cumulative across arms) for greater than 5 percentage points absolute increase in over-escalation in Arm 3 relative to Arm 2.	At the encounter
ANCHOR diagnostic-test characterization (Arm 3 only) Tidsramme: At primary analysis (week 12)	ANCHOR certificate (operational state: hazard flagged / no hazard flagged / out of scope) characterized as a diagnostic test against adjudicated ground truth: sensitivity, specificity, positive predictive value, negative predictive value, area under the receiver-operating-characteristic curve, and calibration metrics (Expected Calibration Error, Brier score, reliability diagram).	At primary analysis (week 12)
Algorithmic fairness audit Tidsramme: At primary analysis (week 12)	Equalized odds gap, equal opportunity gap, predictive parity gap, within-subgroup calibration (Brier-difference tolerance 0.05) on the primary composite endpoint across race/ethnicity (OMB 1997 categories), sex assigned at birth, age band (18-44 / 45-64 / 65-74 / 75+), urban/rural per RUCA 2024, and primary language. Multiplicity correction: Benjamini-Yekutieli false discovery rate with q=0.05. Bootstrap CI: 1000 resamples, seed 20260425, percentile method. Subgroup metrics reported only when n is at least 30 per cell.	At primary analysis (week 12)
Hallucination certificate audit (Arm 3 only) Tidsramme: Cumulative across enrolment	Per Kalai et al. 2025. Every ANCHOR LNN response is assigned a formal certificate with operational categorical state (hazard flagged / no hazard flagged / out of scope) and a structured rule-firing trace against the 3,206-rule clinical logic library. Reported descriptively per acuity stratum and demographic subgroup; not powered for inference at n=240.	Cumulative across enrolment
Concept-decomposition validation (Arm 3 only) Tidsramme: At primary analysis (week 12)	Audit of every ANCHOR concept-decomposition output: percent of "known" concepts traceable to specific entities (drug / lab / guideline body / PMID; pre-specified pass threshold 80 percent or higher); presence of residual attribution-gap statement (epsilon disclosure; pre-specified threshold 100 percent); percent "discovered" concepts that are non-trivial (pre-specified threshold 60 percent or higher). Failure to meet any threshold triggers a Limitations paragraph and rule-library audit.	At primary analysis (week 12)

Samarbejdspartnere og efterforskere

Det er her, du vil finde personer og organisationer, der er involveret i denne undersøgelse.

Sponsor

Waymark

Efterforskere

Ledende efterforsker: Sanjay Basu, MD, PhD, Waymark

Datoer for undersøgelser

Disse datoer sporer fremskridtene for indsendelser af undersøgelsesrekord og resumeresultater til ClinicalTrials.gov. Studieregistreringer og rapporterede resultater gennemgås af National Library of Medicine (NLM) for at sikre, at de opfylder specifikke kvalitetskontrolstandarder, før de offentliggøres på den offentlige hjemmeside.

Studer store datoer

Studiestart (Faktiske)

15. maj 2026

Primær færdiggørelse (Faktiske)

2. juli 2026

Studieafslutning (Faktiske)

6. juli 2026

Datoer for studieregistrering

Først indsendt

8. maj 2026

Først indsendt, der opfyldte QC-kriterier

15. maj 2026

Først opslået (Faktiske)

19. maj 2026

Opdateringer af undersøgelsesjournaler

Sidste opdatering sendt (Faktiske)

8. juli 2026

Sidste opdatering indsendt, der opfyldte kvalitetskontrolkriterier

6. juli 2026

Sidst verificeret

1. juli 2026

Mere information

Begreber relateret til denne undersøgelse

Nøgleord

Andre undersøgelses-id-numre

ANCHOR-2026-01

Plan for individuelle deltagerdata (IPD)

Planlægger du at dele individuelle deltagerdata (IPD)?

INGEN

IPD-planbeskrivelse

The trial dataset contains protected health information governed by HIPAA; no patient-level data, de-identified or otherwise, will be deposited in a public repository or shared with peer reviewers. Aggregate per-arm summary statistics, primary and secondary endpoint estimates, and adverse-event tables will be posted to ClinicalTrials.gov in accordance with the Final Rule (42 CFR Part 11).

Lægemiddel- og udstyrsoplysninger, undersøgelsesdokumenter

Studerer et amerikansk FDA-reguleret lægemiddelprodukt

Ingen

Studerer et amerikansk FDA-reguleret enhedsprodukt

Ingen

produkt fremstillet i og eksporteret fra U.S.A.

Ingen

Disse oplysninger blev hentet direkte fra webstedet clinicaltrials.gov uden ændringer. Hvis du har nogen anmodninger om at ændre, fjerne eller opdatere dine undersøgelsesoplysninger, bedes du kontakte register@clinicaltrials.gov. Så snart en ændring er implementeret på clinicaltrials.gov, vil denne også blive opdateret automatisk på vores hjemmeside .

Kliniske forsøg med Telemedicin

The Hong Kong Polytechnic University

Afsluttet

En telecare-baseret intervention til at reducere stressniveauet for uformelle plejere af ældre voksne i lokalsamfundet

Telemedicin

Hong Kong
Wake Forest University Health Sciences

Afsluttet

På kontoret versus telemedicin præoperativt besøg

Telemedicin

Forenede Stater
Buzzi Children's Hospital

Afsluttet

Multi-stakeholder vurdering af økonomiske og ledelsesmæssige determinanter og virkninger af telemedicin

Telemedicin

Italien
University of Michigan

Afsluttet

Praktikantsundhedsstudie 2021 (IHS 2021)

Telemedicin

Forenede Stater
The Hong Kong Polytechnic University

Afsluttet

Effekter af mHealth på at fremme egenomsorgssundhedsledelse blandt ældre voksne i Fællesskabet

Telemedicin

Hong Kong
University of Michigan

Afsluttet

Intern sundhedsundersøgelse: 2020 kohorte mikro-randomiseret forsøg

Telemedicin

Forenede Stater
US Department of Veterans Affairs

Afsluttet

Veteranerfaringer med at bruge sikker meddelelse

Telemedicin

Forenede Stater
Attune Health Research, Inc.
Rheumatology Research Foundation

Rekruttering

Telehealth-leveret sundhedspleje for at forbedre plejen (THRIVE)

Telemedicin

Forenede Stater
University Hospital, Basel, Switzerland
Innosuisse - Swiss Innovation Agency

Afsluttet

Hospital@Home: Telemedicinsk behandling efter indlagt hospitalsophold

Telemedicin

Schweiz
Wake Forest University Health Sciences

Afsluttet

Super -understøttere - Virtual Care Technology Navigators

Telemedicin

Forenede Stater

Kliniske forsøg med Gemini 3.1 Pro with Safety Prompt

North Sichuan Medical College
Afﬁliated Hospital of North Sichuan Medical College

Afsluttet

Oftalmiske sygdomme og AI: En RCT -undersøgelse

Øjensygdomme

Kina

ANCHOR Validation Trial in High-Risk Multidisciplinary Care

Studieoversigt

Status

Betingelser

Intervention / Behandling

Detaljeret beskrivelse

Undersøgelsestype

Tilmelding (Faktiske)

Fase

Kontakter og lokationer

Studiesteder

Deltagelseskriterier

Berettigelseskriterier

Aldre berettiget til at studere

Tager imod sunde frivillige

Beskrivelse

Studieplan

Hvordan er undersøgelsen tilrettelagt?

Design detaljer

Antal våben

Våben og indgreb

Deltagergruppe / Arm

Intervention / Behandling

Hvad måler undersøgelsen?

Primære resultatmål

Resultatmål

Foranstaltningsbeskrivelse

Tidsramme

Sekundære resultatmål

Resultatmål

Foranstaltningsbeskrivelse

Tidsramme

Andre resultatmål

Resultatmål

Foranstaltningsbeskrivelse

Tidsramme

Samarbejdspartnere og efterforskere

Sponsor

Efterforskere

Datoer for undersøgelser

Studer store datoer

Studiestart (Faktiske)

Primær færdiggørelse (Faktiske)

Studieafslutning (Faktiske)

Datoer for studieregistrering

Først indsendt

Først indsendt, der opfyldte QC-kriterier

Først opslået (Faktiske)

Opdateringer af undersøgelsesjournaler

Sidste opdatering sendt (Faktiske)

Sidste opdatering indsendt, der opfyldte kvalitetskontrolkriterier

Sidst verificeret

Mere information

Begreber relateret til denne undersøgelse

Nøgleord

Andre undersøgelses-id-numre

Plan for individuelle deltagerdata (IPD)

Planlægger du at dele individuelle deltagerdata (IPD)?

IPD-planbeskrivelse

Lægemiddel- og udstyrsoplysninger, undersøgelsesdokumenter

Studerer et amerikansk FDA-reguleret lægemiddelprodukt

Studerer et amerikansk FDA-reguleret enhedsprodukt

produkt fremstillet i og eksporteret fra U.S.A.

Kliniske forsøg med Telemedicin

Kliniske forsøg med Gemini 3.1 Pro with Safety Prompt

Søg i lignende forsøg

Sponsorer og samarbejdspartnere

Medicinske tilstande

Narkotikainterventioner

CROs by country

CROs in Hungary

Betingelser

Sjældne sygdomme

Narkotikainterventioner

Kosttilskud

Sponsor / samarbejdspartnere

Placeringer