Preliminary Evaluation of a Large Language Model-Based Tool for Complex Surgical Decision Support in Lung Cancer

June 13, 2026 updated by: XiuYuan Chen, Peking University People's Hospital

This study is an exploratory effect-size estimation study, with the following specific objectives: ① to estimate the point estimate and 95% confidence interval of the Win Ratio for the experimental group (GAPS-Agent) versus the control group (large language model) in blinded pairwise preference judgments by thoracic surgery expert adjudicators, to serve as a sample size planning parameter for subsequent multicenter confirmatory clinical trials; ② to preliminarily evaluate the value of GAPS-Agent within clinical workflows.The hypothesis of this study is as follows: compared with a general-purpose large language model without medical enhancement (control group), a structured agentic workflow optimized on the basis of the GAPS evaluation framework (GAPS-Agent, experimental group) can help junior resident physicians generate clinical decision plans for complex lung cancer cases that are more strongly preferred by senior thoracic surgery expert adjudicators.

Study Overview

Status

Enrolling by invitation

Conditions

Intervention / Treatment

Study Type

Interventional

Enrollment (Estimated)

Phase

Not Applicable

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

China
- Beijing Municipality
  - Beijing, Beijing Municipality, China, 100044
    - Peking University People's Hospital

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

Adult
Older Adult

Accepts Healthy Volunteers

Description

Inclusion Criteria:

Resident Physician Subjects:
1. Holds a valid and legally effective Physician Practice License of the People's Republic of China;
2. Currently holds the rank of resident physician in a thoracic surgery department at a tertiary Class A (3A) hospital;
3. Agrees to complete all assessment tasks of the main study phase in accordance with the study protocol;
4. Can guarantee the time and effort required to complete all assessment tasks of the main study.
Study Cases:
1. The case was discussed at the Thoracic Oncology Multidisciplinary Team (MDT) conference of Peking University People's Hospital between January 2025 and May 2026;
2. The current version of the NCCN guidelines does not provide an explicit recommendation covering the management of the case;
3. Does not overlap with the GAPS evaluation set;
4. The case is presented in pure text in a structured format, with all direct and indirect identifiers removed and complete de-identification performed prior to inclusion;
5. From the pool of eligible cases, 12 cases will be randomly drawn using Python (numpy.random, with a fixed and archived seed) to serve as the main study cases. The cases will cover 6 themes (chest mass of undetermined diagnosis, early-stage lung cancer, locally advanced lung cancer, oligometastatic/oligoprogressive disease, special intraoperative situations, and tumor recurrence), with 2 cases per theme.
Adjudication Expert Panel:
1. Holds a valid and legally effective Physician Practice License of the People's Republic of China;
2. Currently holds the rank of attending physician or above in a thoracic surgery department at a tertiary Class A hospital;
3. Chairs or regularly participates in lung cancer multidisciplinary team (MDT) work in their department.

Exclusion Criteria:

Resident Physician Subjects:
1. Has previously participated in the construction of the GAPS evaluation set or the development of GAPS-Agent;
2. Unable to complete the tasks of the study phase.
Study Cases:
1. Key case information is missing, such as text-form data on pathology (including IHC/NGS), imaging, laboratory tests, prior medical history, comorbidities, or PS score;
2. Decision-making for the case is strictly dependent on non-text information.
Adjudication Expert Panel:
1. Participated in the construction of the GAPS evaluation set, the content validity verification, or the development of GAPS-Agent for this study;
2. Has a direct conflict of interest with any specific product among the two-arm tools of this study.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Primary Purpose: Other
Allocation: Randomized
Interventional Model: Parallel Assignment
Masking: Single

Arms and Interventions

Participant Group / Arm Participant Group / Arm A group or subgroup of participants in a clinical trial that receives a specific intervention/treatment, or no intervention, according to the trial's protocol.	Intervention / Treatment Intervention / Treatment A process or action that is the focus of a clinical study. Interventions include drugs, medical devices, procedures, vaccines, and other products that are either investigational or already available. Interventions can also include noninvasive approaches, such as education or modifying diet and exercise.
Experimental: test arm GAPS-Agent	Other: GAPS-Agent The research group has previously developed the GAPS evaluation framework for complex clinical decision-making in lung cancer. In this framework, G (Grounding) characterizes the cognitive depth of decision-making (ranging from knowledge retrieval to decisions that go beyond clinical guidelines), A (Authority) corresponds to the grading of evidence strength, P (Perturbation) describes the identification and management of real-world clinical confounding factors, and S (Strength) corresponds to the calibration of recommendation strength. Within this framework, the research group has completed the construction of a 100-item complex lung cancer decision-making evaluation set along with its corresponding rubrics, and has invited multiple thoracic oncology experts to complete content validity validation. Based on this, the research group developed GAPS-Agent, which uses an open-source large language model as its foundation and integrates functional modules such as guideline and evidence retri
Active Comparator: control arm LLM	Other: LLM Open source large language model that is not specifically enhanced in medical field.

Participant Group / Arm

Intervention / Treatment

Experimental: test arm

GAPS-Agent

Other: GAPS-Agent

The research group has previously developed the GAPS evaluation framework for complex clinical decision-making in lung cancer. In this framework, G (Grounding) characterizes the cognitive depth of decision-making (ranging from knowledge retrieval to decisions that go beyond clinical guidelines), A (Authority) corresponds to the grading of evidence strength, P (Perturbation) describes the identification and management of real-world clinical confounding factors, and S (Strength) corresponds to the calibration of recommendation strength. Within this framework, the research group has completed the construction of a 100-item complex lung cancer decision-making evaluation set along with its corresponding rubrics, and has invited multiple thoracic oncology experts to complete content validity validation. Based on this, the research group developed GAPS-Agent, which uses an open-source large language model as its foundation and integrates functional modules such as guideline and evidence retri

Active Comparator: control arm

LLM

Other: LLM

Open source large language model that is not specifically enhanced in medical field.

What is the study measuring?

Primary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Overall plan Win Ratio Time Frame: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.

Secondary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Inter-rater agreement Time Frame: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	For the ternary preference judgment results of 10 expert judges across 192 paired comparisons and 6 evaluation domains, Fleiss' kappa was used to assess inter-rater agreement. The kappa value and its 95% confidence interval are reported for each evaluation domain.	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Redundancy Win Ratio Time Frame: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Evidence-based medicine adherence Win Ratio Time Frame: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Actionability Win Ratio Time Frame: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Completeness Win Ratio Time Frame: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Safety Win Ratio Time Frame: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
GAPS automated rubric score Time Frame: Generated up to 3 weeks after residents finished their plan generation.	A third-party large language model, independent of the two study arms' base models, served as the judge model and automatically scored all 96 plans according to the GAPS rubric.	Generated up to 3 weeks after residents finished their plan generation.
Subject physician's self-confidence score Time Frame: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	After submitting each case plan, the participating physicians self-rated their confidence in their own plan using a 1-5 point Likert scale.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
Tool satisfaction score Time Frame: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	After submitting each case plan, the participating physicians rated their satisfaction with the tool using a 1-5 point Likert scale.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
Tool trustworthiness score Time Frame: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	After submitting each case plan, the participating physicians rated the tool's credibility using a 1-5 point Likert scale.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
Decision-making time Time Frame: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	The time taken (in minutes) by each participating physician to complete the production of each case plan was automatically recorded by the evaluation platform. Differences between groups were analyzed using a linear mixed-effects model.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Sponsor

Peking University People's Hospital

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

June 10, 2026

Primary Completion (Estimated)

June 21, 2026

Study Completion (Estimated)

June 21, 2026

Study Registration Dates

First Submitted

June 10, 2026

First Submitted That Met QC Criteria

June 13, 2026

First Posted (Actual)

June 17, 2026

Study Record Updates

Last Update Posted (Actual)

June 17, 2026

Last Update Submitted That Met QC Criteria

June 13, 2026

Last Verified

June 1, 2026

More Information

Terms related to this study

Keywords

Additional Relevant MeSH Terms

Other Study ID Numbers

2026PHB458-001

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Lung Cancer (NSCLC)

NCT07639242

Not yet recruiting

Olomorasib + Pembrolizumab KRAS G12C Mutant, PD-L1 TPS 1-49% Locally Advanced or Metastatic NSCLC

Lung Cancer | Metastatic NSCLC - Non-Small Cell Lung Cancer | Lung Cancer (NSCLC) | Locally Advanced NSCLC
NCT05859217

Not yet recruiting

A Study of Combining Cabozantinib and Atezolizumab for Advanced/Metastatic NSCLC (Cabatezo-1) (Cabatezo-1)

Lung Cancer | NSCLC Stage IV | Advanced NSCLC | Metastatic NSCLC - Non-Small Cell Lung Cancer
NCT06881784

Recruiting

Study of Daraxonrasib (RMC-6236) in Patients With RAS Mutated NSCLC (RASolve 301) (RASolve 301)

Non-Small Cell Lung Cancer | NSCLC | NSCLC (Non-small Cell Lung Cancer) | NSCLC (Advanced Non-small Cell Lung Cancer) | NSCLC (Non-small Cell Lung Carcinoma)
NCT07338396

Not yet recruiting

A Prospective Multicenter Study of the Association Between TCM Syndromes and EGFR-TKI Efficacy in Lung Cancer Patients

Non Small Cell Lung Cancer NSCLC
NCT07188480

Recruiting

The Role of DNA and RNA in NGS Analyses for Advaced Stage NSCLC Patients

Non Small Cell Lung Cancer NSCLC
NCT07169708

Recruiting

A Retrospective Observational Study of Nivolumab in Combination With Chemotherapy as Neoadjuvant Therapy for Resectable NSCLC Patients: Real-World Experience in Taiwan (NEOREAL)

Non Small Cell Lung Cancer NSCLC
NCT04027647

Active, not recruiting

Phase 2 Study of Dacomitinib in NSCLC

NSCLC Stage IV | NSCLC Stage IIIB | Recurrent NSCLC | NSCLC Stage IIIC | EGFR Positive Non-Small Cell Lung Cancer
NCT07669779

Not yet recruiting

A Phase II Study of AK146D1 in Combination With AK112 in Advanced Non-Small Cell Lung Cancer

Advanced Non Small Cell Lung Cancer (NSCLC)
NCT07638891

Recruiting

A Study of HDM2020 in Patients With Advanced Sq-NSCLC

Squamous Non-small Cell Lung Cancer(NSCLC)
NCT07590531

Not yet recruiting

AMT-116 in Combination With Ivosidan in Patients With Lung Cancer

Advanced Non-small Cell Lung Cancer (NSCLC)

Clinical Trials on GAPS-Agent

NCT03200535

Completed

Comparison of Outreach Methods to Encourage Enrollment in Diabetes Prevention and Weight Management Programs

Obesity | PreDiabetes
NCT03291717

Completed

Bridging Community Gaps Photovoice (BCGP)

Mental Illness | Social Isolation
NCT05185232

Not yet recruiting

Congenital Heart Initiative-Redefining Outcomes and Navigation to Adult Centered Care (CHI-RON)

Congenital Heart Disease | Comorbidities and Coexisting Conditions
NCT04894903

Completed

Effects of a Patient Portal Intervention to Address Diabetes Care Gaps

Diabetes Mellitus
NCT03937388

Completed

Speech Perception Performance With Gap-interleaved Stimulation Paradigms

Hearing Loss | Deafness
NCT04728620

Completed

Evaluation of a Patient Portal Intervention to Address Diabetes Care Gaps

Diabetes Mellitus
NCT06682013

Withdrawn

Virtual Agent Feasibility in Oncology Patients (NTT Data)

Lung Cancer
NCT00823628

Completed

Contrast-medium Induced Nephrotoxicity in Patients Undergoing Coronary Angiography - Iodixanol Versus Iopromide

Chronic Renal Insufficiency | Coronary Angiography
NCT05674825

Recruiting

Investigation of Profile-related Evidence Determining Individualized Cancer Therapy for Patients With Aggressive Malignancies and Poor Prognoses (MCW I-PREDICT)

Cancer
NCT07305064

Completed

A Virtual Peer Agent for Counseling Adolescents With Stressful Life Events

Psychological | Adolescent - Emotional Problem | Agent | Conversational

Preliminary Evaluation of a Large Language Model-Based Tool for Complex Surgical Decision Support in Lung Cancer

Study Overview

Status

Conditions

Intervention / Treatment

Study Type

Enrollment (Estimated)

Phase

Contacts and Locations

Study Locations

Participation Criteria

Eligibility Criteria

Ages Eligible for Study

Accepts Healthy Volunteers

Description

Study Plan

How is the study designed?

Design Details

Number of Arms

Arms and Interventions

Participant Group / Arm

Intervention / Treatment

What is the study measuring?

Primary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Secondary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Collaborators and Investigators

Sponsor

Study record dates

Study Major Dates

Study Start (Actual)

Primary Completion (Estimated)

Study Completion (Estimated)

Study Registration Dates

First Submitted

First Submitted That Met QC Criteria

First Posted (Actual)

Study Record Updates

Last Update Posted (Actual)

Last Update Submitted That Met QC Criteria

Last Verified

More Information

Terms related to this study

Keywords

Additional Relevant MeSH Terms

Other Study ID Numbers

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

Clinical Trials on Lung Cancer (NSCLC)

Clinical Trials on GAPS-Agent

Search Similar Trials

Sponsors and Collaborators

Medical Conditions

Drug Interventions

Conditions

Rare Diseases

Drug Interventions

Dietary Supplements

Sponsor/Collaborators

Locations