Denne siden ble automatisk oversatt og nøyaktigheten av oversettelsen er ikke garantert. Vennligst referer til engelsk versjon for en kildetekst.

Preliminary Evaluation of a Large Language Model-Based Tool for Complex Surgical Decision Support in Lung Cancer

13. juni 2026 oppdatert av: XiuYuan Chen, Peking University People's Hospital

This study is an exploratory effect-size estimation study, with the following specific objectives: ① to estimate the point estimate and 95% confidence interval of the Win Ratio for the experimental group (GAPS-Agent) versus the control group (large language model) in blinded pairwise preference judgments by thoracic surgery expert adjudicators, to serve as a sample size planning parameter for subsequent multicenter confirmatory clinical trials; ② to preliminarily evaluate the value of GAPS-Agent within clinical workflows.The hypothesis of this study is as follows: compared with a general-purpose large language model without medical enhancement (control group), a structured agentic workflow optimized on the basis of the GAPS evaluation framework (GAPS-Agent, experimental group) can help junior resident physicians generate clinical decision plans for complex lung cancer cases that are more strongly preferred by senior thoracic surgery expert adjudicators.

Studieoversikt

Status

Påmelding etter invitasjon

Forhold

Intervensjon / Behandling

Studietype

Intervensjonell

Registrering (Antatt)

Fase

Ikke aktuelt

Kontakter og plasseringer

Denne delen inneholder kontaktinformasjon for de som utfører studien, og informasjon om hvor denne studien blir utført.

Studiesteder

Kina
- Beijing Municipality
  - Beijing, Beijing Municipality, Kina, 100044
    - Peking University People's Hospital

Deltakelseskriterier

Forskere ser etter personer som passer til en bestemt beskrivelse, kalt kvalifikasjonskriterier. Noen eksempler på disse kriteriene er en persons generelle helsetilstand eller tidligere behandlinger.

Kvalifikasjonskriterier

Alder som er kvalifisert for studier

Voksen
Eldre voksen

Tar imot friske frivillige

Nei

Beskrivelse

Inclusion Criteria:

Resident Physician Subjects:
1. Holds a valid and legally effective Physician Practice License of the People's Republic of China;
2. Currently holds the rank of resident physician in a thoracic surgery department at a tertiary Class A (3A) hospital;
3. Agrees to complete all assessment tasks of the main study phase in accordance with the study protocol;
4. Can guarantee the time and effort required to complete all assessment tasks of the main study.
Study Cases:
1. The case was discussed at the Thoracic Oncology Multidisciplinary Team (MDT) conference of Peking University People's Hospital between January 2025 and May 2026;
2. The current version of the NCCN guidelines does not provide an explicit recommendation covering the management of the case;
3. Does not overlap with the GAPS evaluation set;
4. The case is presented in pure text in a structured format, with all direct and indirect identifiers removed and complete de-identification performed prior to inclusion;
5. From the pool of eligible cases, 12 cases will be randomly drawn using Python (numpy.random, with a fixed and archived seed) to serve as the main study cases. The cases will cover 6 themes (chest mass of undetermined diagnosis, early-stage lung cancer, locally advanced lung cancer, oligometastatic/oligoprogressive disease, special intraoperative situations, and tumor recurrence), with 2 cases per theme.
Adjudication Expert Panel:
1. Holds a valid and legally effective Physician Practice License of the People's Republic of China;
2. Currently holds the rank of attending physician or above in a thoracic surgery department at a tertiary Class A hospital;
3. Chairs or regularly participates in lung cancer multidisciplinary team (MDT) work in their department.

Exclusion Criteria:

Resident Physician Subjects:
1. Has previously participated in the construction of the GAPS evaluation set or the development of GAPS-Agent;
2. Unable to complete the tasks of the study phase.
Study Cases:
1. Key case information is missing, such as text-form data on pathology (including IHC/NGS), imaging, laboratory tests, prior medical history, comorbidities, or PS score;
2. Decision-making for the case is strictly dependent on non-text information.
Adjudication Expert Panel:
1. Participated in the construction of the GAPS evaluation set, the content validity verification, or the development of GAPS-Agent for this study;
2. Has a direct conflict of interest with any specific product among the two-arm tools of this study.

Studieplan

Denne delen gir detaljer om studieplanen, inkludert hvordan studien er utformet og hva studien måler.

Hvordan er studiet utformet?

Designdetaljer

Primært formål: Annen
Tildeling: Randomisert
Intervensjonsmodell: Parallell tildeling
Masking: Enkelt

Antall våpen

Våpen og intervensjoner

Deltakergruppe / Arm	Intervensjon / Behandling
Eksperimentell: test arm GAPS-Agent	Annen: GAPS-Agent The research group has previously developed the GAPS evaluation framework for complex clinical decision-making in lung cancer. In this framework, G (Grounding) characterizes the cognitive depth of decision-making (ranging from knowledge retrieval to decisions that go beyond clinical guidelines), A (Authority) corresponds to the grading of evidence strength, P (Perturbation) describes the identification and management of real-world clinical confounding factors, and S (Strength) corresponds to the calibration of recommendation strength. Within this framework, the research group has completed the construction of a 100-item complex lung cancer decision-making evaluation set along with its corresponding rubrics, and has invited multiple thoracic oncology experts to complete content validity validation. Based on this, the research group developed GAPS-Agent, which uses an open-source large language model as its foundation and integrates functional modules such as guideline and evidence retri
Aktiv komparator: control arm LLM	Annen: LLM Open source large language model that is not specifically enhanced in medical field.

Deltakergruppe / Arm

Intervensjon / Behandling

Eksperimentell: test arm

GAPS-Agent

Annen: GAPS-Agent

The research group has previously developed the GAPS evaluation framework for complex clinical decision-making in lung cancer. In this framework, G (Grounding) characterizes the cognitive depth of decision-making (ranging from knowledge retrieval to decisions that go beyond clinical guidelines), A (Authority) corresponds to the grading of evidence strength, P (Perturbation) describes the identification and management of real-world clinical confounding factors, and S (Strength) corresponds to the calibration of recommendation strength. Within this framework, the research group has completed the construction of a 100-item complex lung cancer decision-making evaluation set along with its corresponding rubrics, and has invited multiple thoracic oncology experts to complete content validity validation. Based on this, the research group developed GAPS-Agent, which uses an open-source large language model as its foundation and integrates functional modules such as guideline and evidence retri

Aktiv komparator: control arm

LLM

Annen: LLM

Open source large language model that is not specifically enhanced in medical field.

Hva måler studien?

Primære resultatmål

Resultatmål	Tiltaksbeskrivelse	Tidsramme
Overall plan Win Ratio Tidsramme: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.

Sekundære resultatmål

Resultatmål	Tiltaksbeskrivelse	Tidsramme
Inter-rater agreement Tidsramme: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	For the ternary preference judgment results of 10 expert judges across 192 paired comparisons and 6 evaluation domains, Fleiss' kappa was used to assess inter-rater agreement. The kappa value and its 95% confidence interval are reported for each evaluation domain.	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Redundancy Win Ratio Tidsramme: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Evidence-based medicine adherence Win Ratio Tidsramme: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Actionability Win Ratio Tidsramme: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Completeness Win Ratio Tidsramme: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Safety Win Ratio Tidsramme: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
GAPS automated rubric score Tidsramme: Generated up to 3 weeks after residents finished their plan generation.	A third-party large language model, independent of the two study arms' base models, served as the judge model and automatically scored all 96 plans according to the GAPS rubric.	Generated up to 3 weeks after residents finished their plan generation.
Subject physician's self-confidence score Tidsramme: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	After submitting each case plan, the participating physicians self-rated their confidence in their own plan using a 1-5 point Likert scale.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
Tool satisfaction score Tidsramme: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	After submitting each case plan, the participating physicians rated their satisfaction with the tool using a 1-5 point Likert scale.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
Tool trustworthiness score Tidsramme: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	After submitting each case plan, the participating physicians rated the tool's credibility using a 1-5 point Likert scale.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
Decision-making time Tidsramme: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	The time taken (in minutes) by each participating physician to complete the production of each case plan was automatically recorded by the evaluation platform. Differences between groups were analyzed using a linear mixed-effects model.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.

Samarbeidspartnere og etterforskere

Det er her du vil finne personer og organisasjoner som er involvert i denne studien.

Sponsor

Peking University People's Hospital

Studierekorddatoer

Disse datoene sporer fremdriften for innsending av studieposter og sammendragsresultater til ClinicalTrials.gov. Studieposter og rapporterte resultater gjennomgås av National Library of Medicine (NLM) for å sikre at de oppfyller spesifikke kvalitetskontrollstandarder før de legges ut på det offentlige nettstedet.

Studer hoveddatoer

Studiestart (Faktiske)

10. juni 2026

Primær fullføring (Antatt)

21. juni 2026

Studiet fullført (Antatt)

21. juni 2026

Datoer for studieregistrering

Først innsendt

10. juni 2026

Først innsendt som oppfylte QC-kriteriene

13. juni 2026

Først lagt ut (Faktiske)

17. juni 2026

Oppdateringer av studieposter

Sist oppdatering lagt ut (Faktiske)

17. juni 2026

Siste oppdatering sendt inn som oppfylte QC-kriteriene

13. juni 2026

Sist bekreftet

1. juni 2026

Mer informasjon

Begreper knyttet til denne studien

Nøkkelord

Ytterligere relevante MeSH-vilkår

Andre studie-ID-numre

2026PHB458-001

Plan for individuelle deltakerdata (IPD)

Planlegger du å dele individuelle deltakerdata (IPD)?

NEI

Legemiddel- og utstyrsinformasjon, studiedokumenter

Studerer et amerikansk FDA-regulert medikamentprodukt

Nei

Studerer et amerikansk FDA-regulert enhetsprodukt

Nei

Denne informasjonen ble hentet direkte fra nettstedet clinicaltrials.gov uten noen endringer. Hvis du har noen forespørsler om å endre, fjerne eller oppdatere studiedetaljene dine, vennligst kontakt register@clinicaltrials.gov. Så snart en endring er implementert på clinicaltrials.gov, vil denne også bli oppdatert automatisk på nettstedet vårt. .

Kliniske studier på Lungekreft (NSCLC)

First Affiliated Hospital of Wenzhou Medical University

Har ikke rekruttert ennå

Immunodynamikkstyrt optimalisering av individualisert immunokjemoterapi i avansert driver-negativ NSCLC: en randomisert studie

Advanced Non-Small Cell Lung Cancer (NSCLC)
The Christie NHS Foundation Trust

Aktiv, ikke rekrutterende

Pragmatisk analyse av komplekse strålebehandlingstilfeller i kreft i lungen (PRACTICAL)

Lungekreft (NSCLC) | Lung Cancer (SCLC)

Storbritannia
Konya City Hospital

Fullført

Postoperativ membranfunksjon i pediatrisk laparoskopisk abdominal kirurgi ved bruk av samsvar og kikk styrt av lunge -ultralyd

Peep By Lung Ultralyd | Peep med dynamisk etterlevelse

Tyrkia
Trakya University

Har ikke rekruttert ennå

Hyperangulated vs Standard Videolaryngoscopy vs Direct Laryngoscopy for Double-Lumen Endobronchial Tube Intubation

Thoraxkirurgi | Endobronkial intubasjon | One Lung Ventillation (OLV) | Dobbel Lumen Tube Intubasjon
Yonsei University

Fullført

Effekter av dexmedetomidin på respirasjonsmekanikk og oksygenering under én lungeventilasjon med kronisk obstruktiv lungesykdom

KOLS, One Lung Ventilation
Kayseri City Hospital

Fullført

Effekt av Erector Spinae Plane-blokk på cerebral oksygenering under ett-lunge ventilasjon

Cerebral desaturasjon | Nær infrarød spektroskopi | One Lung Ventillation (OLV) | Intraoperativ smertestillende bruk | Erector Spina Plan Block

Tyrkia (Türkiye)
Sichuan University

Har ikke rekruttert ennå

Sikkerhet og effekt av stråleterapi kombinert med immunokjemoterapi hos pre-behandlede SCLC-pasienter med levermetastaser

Lung Cancer (SCLC)

Kina
Universitas Jenderal Soedirman
RS Prof. Dr. Margono Soekardjo Purwokerto

Fullført

Sammenheng mellom ettlungeventilasjon og markører for hjerteskade (troponin T og I) i torakale kirurgiske inngrep (OLVTIS) (OLVTIS)

Thorax anestesi | One Lung Ventillation (OLV)

Indonesia
Hunan Province Tumor Hospital

Har ikke rekruttert ennå

The Efficacy and Safety of Trastuzumab Deruxtecan in Advanced or Metastatic NSCLC With HER2 Over Expression

NSCLC
Wen-zhao ZHONG

Rekruttering

Sub-lobectomy vs Lobectomy in IIA-IIIB NSCLC After Neoadjuvant IO+Chemo

NSCLC

Kina

Kliniske studier på GAPS-Agent

Postgraduate Institute of Dental Sciences Rohtak

Rekruttering

"Sammenlignende evaluering av aggressiv gap artroplastikk med minimal gap artroplastikk i behandling av TMJ ankylose"

Artoplastikk

India
Universitas Diponegoro

Rekruttering

Den prognostiske verdien av aniongap i å forutsi dødelighet blant pasienter med akutt lungeemboli (PRAGUE-PE)

Akutt lungeemboli (PE)

Indonesia
Wyeth is now a wholly owned subsidiary of Pfizer

Fullført

Enkelt stigende dosestudie av sikkerhet, tolerabilitet og farmakokinetikk til GAP-134 administrert intravenøst

Arytmi

Forente stater
Saglik Bilimleri Universitesi

Fullført

Prognostisk poengsum sammenligning i IPF og HP

Overfølsomhet Pneumonitt | Interstitiell lungesykdom (ILD) | IPF | Fibrotisk lungesykdom

Tyrkia (Türkiye)
University of Victoria

Fullført

En randomisert sammenligningsforsøk som undersøker virkningen av et familiebasert matlagingsverksted

Kostholdsvane

Canada
Assiut University

Har ikke rekruttert ennå

Glykemisk gap versus innleggelse plasmaglukosenivå som prediktorer for ICU-utgang kommer hos type 2 diabetespasienter med akutt hjertesvikt

Vurder forholdet mellom glykemisk gab og uønskede kliniske utfall hos diabetespasienter som er innlagt på sykehus med hjertesvikt
Wyeth is now a wholly owned subsidiary of Pfizer

Fullført

Enkelt stigende dose av GAP-134 som en 24-timers IV-infusjon hos friske japanske menn

Arytmi

Japan
Medical College of Wisconsin

Rekruttering

Undersøkelse av profilrelatert bevis som bestemmer individualisert kreftterapi for pasienter med aggressive maligniteter og dårlige prognoser (MCW I-PREDICT)

Kreft

Forente stater
Virginia Commonwealth University
Eunice Kennedy Shriver National Institute of Child Health and Human Development...

Rekruttering

Forebygging av skytevåpenvold i ungdom: En sykehusbasert forebyggingsstrategi

Vold i ungdomsårene

Forente stater
Virginia Commonwealth University
Centers for Disease Control and Prevention

Rekruttering

Forebygging av gjengjeldende våpenvold hos voldelig skadde voksne

Vold

Forente stater

Preliminary Evaluation of a Large Language Model-Based Tool for Complex Surgical Decision Support in Lung Cancer

Studieoversikt

Status

Forhold

Intervensjon / Behandling

Studietype

Registrering (Antatt)

Fase

Kontakter og plasseringer

Studiesteder

Deltakelseskriterier

Kvalifikasjonskriterier

Alder som er kvalifisert for studier

Tar imot friske frivillige

Beskrivelse

Studieplan

Hvordan er studiet utformet?

Designdetaljer

Antall våpen

Våpen og intervensjoner

Deltakergruppe / Arm

Intervensjon / Behandling

Hva måler studien?

Primære resultatmål

Resultatmål

Tiltaksbeskrivelse

Tidsramme

Sekundære resultatmål

Resultatmål

Tiltaksbeskrivelse

Tidsramme

Samarbeidspartnere og etterforskere

Sponsor

Studierekorddatoer

Studer hoveddatoer

Studiestart (Faktiske)

Primær fullføring (Antatt)

Studiet fullført (Antatt)

Datoer for studieregistrering

Først innsendt

Først innsendt som oppfylte QC-kriteriene

Først lagt ut (Faktiske)

Oppdateringer av studieposter

Sist oppdatering lagt ut (Faktiske)

Siste oppdatering sendt inn som oppfylte QC-kriteriene

Sist bekreftet

Mer informasjon

Begreper knyttet til denne studien

Nøkkelord

Ytterligere relevante MeSH-vilkår

Andre studie-ID-numre

Plan for individuelle deltakerdata (IPD)

Planlegger du å dele individuelle deltakerdata (IPD)?

Legemiddel- og utstyrsinformasjon, studiedokumenter

Studerer et amerikansk FDA-regulert medikamentprodukt

Studerer et amerikansk FDA-regulert enhetsprodukt

Kliniske studier på Lungekreft (NSCLC)

Kliniske studier på GAPS-Agent

Søk i lignende forsøk

Sponsorer og samarbeidspartnere

Medisinsk tilstand

Legemiddelintervensjoner

CROs by country

CROs in Ethiopia

Forhold

Sjeldne sykdommer

Legemiddelintervensjoner

Kosttilskudd

Sponsor / samarbeidspartnere

Steder