- ICH GCP
- US Clinical Trials Registry
- Clinical Trial NCT07654036
Preliminary Evaluation of a Large Language Model-Based Tool for Complex Surgical Decision Support in Lung Cancer
June 13, 2026 updated by: XiuYuan Chen, Peking University People's Hospital
This study is an exploratory effect-size estimation study, with the following specific objectives: ① to estimate the point estimate and 95% confidence interval of the Win Ratio for the experimental group (GAPS-Agent) versus the control group (large language model) in blinded pairwise preference judgments by thoracic surgery expert adjudicators, to serve as a sample size planning parameter for subsequent multicenter confirmatory clinical trials; ② to preliminarily evaluate the value of GAPS-Agent within clinical workflows.The hypothesis of this study is as follows: compared with a general-purpose large language model without medical enhancement (control group), a structured agentic workflow optimized on the basis of the GAPS evaluation framework (GAPS-Agent, experimental group) can help junior resident physicians generate clinical decision plans for complex lung cancer cases that are more strongly preferred by senior thoracic surgery expert adjudicators.
Study Overview
Status
Enrolling by invitation
Conditions
Intervention / Treatment
Study Type
Interventional
Enrollment (Estimated)
12
Phase
- Not Applicable
Contacts and Locations
This section provides the contact details for those conducting the study, and information on where this study is being conducted.
Study Locations
-
-
Beijing Municipality
-
Beijing, Beijing Municipality, China, 100044
- Peking University People's Hospital
-
-
Participation Criteria
Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.
Eligibility Criteria
Ages Eligible for Study
- Adult
- Older Adult
Accepts Healthy Volunteers
No
Description
Inclusion Criteria:
Resident Physician Subjects:
- Holds a valid and legally effective Physician Practice License of the People's Republic of China;
- Currently holds the rank of resident physician in a thoracic surgery department at a tertiary Class A (3A) hospital;
- Agrees to complete all assessment tasks of the main study phase in accordance with the study protocol;
- Can guarantee the time and effort required to complete all assessment tasks of the main study.
Study Cases:
- The case was discussed at the Thoracic Oncology Multidisciplinary Team (MDT) conference of Peking University People's Hospital between January 2025 and May 2026;
- The current version of the NCCN guidelines does not provide an explicit recommendation covering the management of the case;
- Does not overlap with the GAPS evaluation set;
- The case is presented in pure text in a structured format, with all direct and indirect identifiers removed and complete de-identification performed prior to inclusion;
- From the pool of eligible cases, 12 cases will be randomly drawn using Python (numpy.random, with a fixed and archived seed) to serve as the main study cases. The cases will cover 6 themes (chest mass of undetermined diagnosis, early-stage lung cancer, locally advanced lung cancer, oligometastatic/oligoprogressive disease, special intraoperative situations, and tumor recurrence), with 2 cases per theme.
Adjudication Expert Panel:
- Holds a valid and legally effective Physician Practice License of the People's Republic of China;
- Currently holds the rank of attending physician or above in a thoracic surgery department at a tertiary Class A hospital;
- Chairs or regularly participates in lung cancer multidisciplinary team (MDT) work in their department.
Exclusion Criteria:
Resident Physician Subjects:
- Has previously participated in the construction of the GAPS evaluation set or the development of GAPS-Agent;
- Unable to complete the tasks of the study phase.
Study Cases:
- Key case information is missing, such as text-form data on pathology (including IHC/NGS), imaging, laboratory tests, prior medical history, comorbidities, or PS score;
- Decision-making for the case is strictly dependent on non-text information.
Adjudication Expert Panel:
- Participated in the construction of the GAPS evaluation set, the content validity verification, or the development of GAPS-Agent for this study;
- Has a direct conflict of interest with any specific product among the two-arm tools of this study.
Study Plan
This section provides details of the study plan, including how the study is designed and what the study is measuring.
How is the study designed?
Design Details
- Primary Purpose: Other
- Allocation: Randomized
- Interventional Model: Parallel Assignment
- Masking: Single
Arms and Interventions
Participant Group / Arm |
Intervention / Treatment |
|---|---|
|
Experimental: test arm
GAPS-Agent
|
The research group has previously developed the GAPS evaluation framework for complex clinical decision-making in lung cancer.
In this framework, G (Grounding) characterizes the cognitive depth of decision-making (ranging from knowledge retrieval to decisions that go beyond clinical guidelines), A (Authority) corresponds to the grading of evidence strength, P (Perturbation) describes the identification and management of real-world clinical confounding factors, and S (Strength) corresponds to the calibration of recommendation strength.
Within this framework, the research group has completed the construction of a 100-item complex lung cancer decision-making evaluation set along with its corresponding rubrics, and has invited multiple thoracic oncology experts to complete content validity validation.
Based on this, the research group developed GAPS-Agent, which uses an open-source large language model as its foundation and integrates functional modules such as guideline and evidence retri
|
|
Active Comparator: control arm
LLM
|
Open source large language model that is not specifically enhanced in medical field.
|
What is the study measuring?
Primary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Overall plan Win Ratio
Time Frame: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
|
A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality.
The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).
|
Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
|
Secondary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Inter-rater agreement
Time Frame: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
|
For the ternary preference judgment results of 10 expert judges across 192 paired comparisons and 6 evaluation domains, Fleiss' kappa was used to assess inter-rater agreement.
The kappa value and its 95% confidence interval are reported for each evaluation domain.
|
Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
|
|
Redundancy Win Ratio
Time Frame: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
|
A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality.
The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).
|
Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
|
|
Evidence-based medicine adherence Win Ratio
Time Frame: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
|
A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality.
The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).
|
Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
|
|
Actionability Win Ratio
Time Frame: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
|
A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality.
The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).
|
Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
|
|
Completeness Win Ratio
Time Frame: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
|
A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality.
The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).
|
Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
|
|
Safety Win Ratio
Time Frame: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
|
A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality.
The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).
|
Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
|
|
GAPS automated rubric score
Time Frame: Generated up to 3 weeks after residents finished their plan generation.
|
A third-party large language model, independent of the two study arms' base models, served as the judge model and automatically scored all 96 plans according to the GAPS rubric.
|
Generated up to 3 weeks after residents finished their plan generation.
|
|
Subject physician's self-confidence score
Time Frame: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
|
After submitting each case plan, the participating physicians self-rated their confidence in their own plan using a 1-5 point Likert scale.
|
Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
|
|
Tool satisfaction score
Time Frame: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
|
After submitting each case plan, the participating physicians rated their satisfaction with the tool using a 1-5 point Likert scale.
|
Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
|
|
Tool trustworthiness score
Time Frame: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
|
After submitting each case plan, the participating physicians rated the tool's credibility using a 1-5 point Likert scale.
|
Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
|
|
Decision-making time
Time Frame: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
|
The time taken (in minutes) by each participating physician to complete the production of each case plan was automatically recorded by the evaluation platform.
Differences between groups were analyzed using a linear mixed-effects model.
|
Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
|
Collaborators and Investigators
This is where you will find people and organizations involved with this study.
Study record dates
These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.
Study Major Dates
Study Start (Actual)
June 10, 2026
Primary Completion (Estimated)
June 21, 2026
Study Completion (Estimated)
June 21, 2026
Study Registration Dates
First Submitted
June 10, 2026
First Submitted That Met QC Criteria
June 13, 2026
First Posted (Actual)
June 17, 2026
Study Record Updates
Last Update Posted (Actual)
June 17, 2026
Last Update Submitted That Met QC Criteria
June 13, 2026
Last Verified
June 1, 2026
More Information
Terms related to this study
Additional Relevant MeSH Terms
Other Study ID Numbers
- 2026PHB458-001
Plan for Individual participant data (IPD)
Plan to Share Individual Participant Data (IPD)?
NO
Drug and device information, study documents
Studies a U.S. FDA-regulated drug product
No
Studies a U.S. FDA-regulated device product
No
This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.
Clinical Trials on Lung Cancer (NSCLC)
-
UNC Lineberger Comprehensive Cancer CenterNot yet recruitingLung Cancer | Metastatic NSCLC - Non-Small Cell Lung Cancer | Lung Cancer (NSCLC) | Locally Advanced NSCLCUnited States
-
Jun Zhang, MD, PhDGenentech, Inc.; ExelixisNot yet recruitingLung Cancer | NSCLC Stage IV | Advanced NSCLC | Metastatic NSCLC - Non-Small Cell Lung CancerUnited States
-
Revolution Medicines, Inc.RecruitingNon-Small Cell Lung Cancer | NSCLC | NSCLC (Non-small Cell Lung Cancer) | NSCLC (Advanced Non-small Cell Lung Cancer) | NSCLC (Non-small Cell Lung Carcinoma)Japan, Netherlands, Hong Kong, United States, United Kingdom, Belgium, Australia, Spain, Germany, Switzerland, Italy, Taiwan, France, Singapore, Poland, South Korea, Puerto Rico, Ireland, New Zealand
-
Guangzhou University of Traditional Chinese MedicineGuang'anmen Hospital of China Academy of Chinese Medical Sciences; Beijing... and other collaboratorsNot yet recruitingNon Small Cell Lung Cancer NSCLCChina
-
IRCCS Azienda Ospedaliero-Universitaria di BolognaRecruitingNon Small Cell Lung Cancer NSCLCItaly
-
Ono Pharmaceutical Co., Ltd.Bristol-Myers SquibbRecruiting
-
Hangzhou Zhongmei Huadong Pharmaceutical Co., Ltd.RecruitingSquamous Non-small Cell Lung Cancer(NSCLC)China
-
Multitude Therapeutics Inc.Not yet recruitingAdvanced Non-small Cell Lung Cancer (NSCLC)China
-
National Cancer Centre, SingaporePfizerActive, not recruitingNSCLC Stage IV | NSCLC Stage IIIB | Recurrent NSCLC | NSCLC Stage IIIC | EGFR Positive Non-Small Cell Lung CancerKorea, Republic of, Hong Kong, Thailand, Singapore, Malaysia
-
University of Alabama at BirminghamSanofiCompletedNon-small Cell Lung Cancer (NSCLC) | Metastatic NSCLC | Stage IV NSCLCUnited States
Clinical Trials on GAPS-Agent
-
Kaiser PermanenteCompletedObesity | PreDiabetesUnited States
-
Children's National Research InstitutePatient-Centered Outcomes Research Institute; Louisiana Public Health InstituteNot yet recruitingCongenital Heart Disease | Comorbidities and Coexisting ConditionsUnited States
-
Boston University Charles River CampusMental Health Center of Denver; Connecticut State, Department of Mental Health... and other collaboratorsCompleted
-
Vanderbilt University Medical CenterNational Institute of Diabetes and Digestive and Kidney Diseases (NIDDK); National...CompletedDiabetes MellitusUnited States
-
Angelica Perez FornosUniversity of Innsbruck / Department of MechatronicsCompletedHearing Loss | DeafnessSwitzerland
-
Vanderbilt University Medical CenterNational Institute of Diabetes and Digestive and Kidney Diseases (NIDDK); National...CompletedDiabetes MellitusUnited States
-
Duke UniversityWithdrawn
-
Seoul National University Bundang HospitalCompletedChronic Renal Insufficiency | Coronary AngiographyKorea, Republic of
-
Medical College of WisconsinRecruiting
-
Sun Yat-sen UniversityCompletedPsychological | Adolescent - Emotional Problem | Agent | ConversationalChina