이 페이지는 자동 번역되었으며 번역의 정확성을 보장하지 않습니다. 참조하십시오 영문판 원본 텍스트의 경우.

Preliminary Evaluation of a Large Language Model-Based Tool for Complex Surgical Decision Support in Lung Cancer

2026년 6월 13일 업데이트: XiuYuan Chen, Peking University People's Hospital

This study is an exploratory effect-size estimation study, with the following specific objectives: ① to estimate the point estimate and 95% confidence interval of the Win Ratio for the experimental group (GAPS-Agent) versus the control group (large language model) in blinded pairwise preference judgments by thoracic surgery expert adjudicators, to serve as a sample size planning parameter for subsequent multicenter confirmatory clinical trials; ② to preliminarily evaluate the value of GAPS-Agent within clinical workflows.The hypothesis of this study is as follows: compared with a general-purpose large language model without medical enhancement (control group), a structured agentic workflow optimized on the basis of the GAPS evaluation framework (GAPS-Agent, experimental group) can help junior resident physicians generate clinical decision plans for complex lung cancer cases that are more strongly preferred by senior thoracic surgery expert adjudicators.

연구 개요

상태

초대로 등록

정황

개입 / 치료

연구 유형

중재적

등록 (추정된)

단계

해당 없음

연락처 및 위치

이 섹션에서는 연구를 수행하는 사람들의 연락처 정보와 이 연구가 수행되는 장소에 대한 정보를 제공합니다.

연구 장소

중국
- Beijing Municipality
  - Beijing, Beijing Municipality, 중국, 100044
    - Peking University People's Hospital

참여기준

연구원은 적격성 기준이라는 특정 설명에 맞는 사람을 찾습니다. 이러한 기준의 몇 가지 예는 개인의 일반적인 건강 상태 또는 이전 치료입니다.

자격 기준

공부할 수 있는 나이

성인
고령자

건강한 자원 봉사자를 받아들입니다

아니

설명

Inclusion Criteria:

Resident Physician Subjects:
1. Holds a valid and legally effective Physician Practice License of the People's Republic of China;
2. Currently holds the rank of resident physician in a thoracic surgery department at a tertiary Class A (3A) hospital;
3. Agrees to complete all assessment tasks of the main study phase in accordance with the study protocol;
4. Can guarantee the time and effort required to complete all assessment tasks of the main study.
Study Cases:
1. The case was discussed at the Thoracic Oncology Multidisciplinary Team (MDT) conference of Peking University People's Hospital between January 2025 and May 2026;
2. The current version of the NCCN guidelines does not provide an explicit recommendation covering the management of the case;
3. Does not overlap with the GAPS evaluation set;
4. The case is presented in pure text in a structured format, with all direct and indirect identifiers removed and complete de-identification performed prior to inclusion;
5. From the pool of eligible cases, 12 cases will be randomly drawn using Python (numpy.random, with a fixed and archived seed) to serve as the main study cases. The cases will cover 6 themes (chest mass of undetermined diagnosis, early-stage lung cancer, locally advanced lung cancer, oligometastatic/oligoprogressive disease, special intraoperative situations, and tumor recurrence), with 2 cases per theme.
Adjudication Expert Panel:
1. Holds a valid and legally effective Physician Practice License of the People's Republic of China;
2. Currently holds the rank of attending physician or above in a thoracic surgery department at a tertiary Class A hospital;
3. Chairs or regularly participates in lung cancer multidisciplinary team (MDT) work in their department.

Exclusion Criteria:

Resident Physician Subjects:
1. Has previously participated in the construction of the GAPS evaluation set or the development of GAPS-Agent;
2. Unable to complete the tasks of the study phase.
Study Cases:
1. Key case information is missing, such as text-form data on pathology (including IHC/NGS), imaging, laboratory tests, prior medical history, comorbidities, or PS score;
2. Decision-making for the case is strictly dependent on non-text information.
Adjudication Expert Panel:
1. Participated in the construction of the GAPS evaluation set, the content validity verification, or the development of GAPS-Agent for this study;
2. Has a direct conflict of interest with any specific product among the two-arm tools of this study.

공부 계획

이 섹션에서는 연구 설계 방법과 연구가 측정하는 내용을 포함하여 연구 계획에 대한 세부 정보를 제공합니다.

연구는 어떻게 설계됩니까?

디자인 세부사항

주 목적: 다른
할당: 무작위
중재 모델: 병렬 할당
마스킹: 하나의

팔의 수

무기와 개입

참가자 그룹 / 팔	개입 / 치료
실험적: test arm GAPS-Agent	다른: GAPS-Agent The research group has previously developed the GAPS evaluation framework for complex clinical decision-making in lung cancer. In this framework, G (Grounding) characterizes the cognitive depth of decision-making (ranging from knowledge retrieval to decisions that go beyond clinical guidelines), A (Authority) corresponds to the grading of evidence strength, P (Perturbation) describes the identification and management of real-world clinical confounding factors, and S (Strength) corresponds to the calibration of recommendation strength. Within this framework, the research group has completed the construction of a 100-item complex lung cancer decision-making evaluation set along with its corresponding rubrics, and has invited multiple thoracic oncology experts to complete content validity validation. Based on this, the research group developed GAPS-Agent, which uses an open-source large language model as its foundation and integrates functional modules such as guideline and evidence retri
활성 비교기: control arm LLM	다른: LLM Open source large language model that is not specifically enhanced in medical field.

참가자 그룹 / 팔

개입 / 치료

실험적: test arm

GAPS-Agent

다른: GAPS-Agent

The research group has previously developed the GAPS evaluation framework for complex clinical decision-making in lung cancer. In this framework, G (Grounding) characterizes the cognitive depth of decision-making (ranging from knowledge retrieval to decisions that go beyond clinical guidelines), A (Authority) corresponds to the grading of evidence strength, P (Perturbation) describes the identification and management of real-world clinical confounding factors, and S (Strength) corresponds to the calibration of recommendation strength. Within this framework, the research group has completed the construction of a 100-item complex lung cancer decision-making evaluation set along with its corresponding rubrics, and has invited multiple thoracic oncology experts to complete content validity validation. Based on this, the research group developed GAPS-Agent, which uses an open-source large language model as its foundation and integrates functional modules such as guideline and evidence retri

활성 비교기: control arm

LLM

다른: LLM

Open source large language model that is not specifically enhanced in medical field.

연구는 무엇을 측정합니까?

주요 결과 측정

결과 측정	측정값 설명	기간
Overall plan Win Ratio 기간: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.

2차 결과 측정

결과 측정	측정값 설명	기간
Inter-rater agreement 기간: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	For the ternary preference judgment results of 10 expert judges across 192 paired comparisons and 6 evaluation domains, Fleiss' kappa was used to assess inter-rater agreement. The kappa value and its 95% confidence interval are reported for each evaluation domain.	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Redundancy Win Ratio 기간: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Evidence-based medicine adherence Win Ratio 기간: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Actionability Win Ratio 기간: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Completeness Win Ratio 기간: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Safety Win Ratio 기간: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
GAPS automated rubric score 기간: Generated up to 3 weeks after residents finished their plan generation.	A third-party large language model, independent of the two study arms' base models, served as the judge model and automatically scored all 96 plans according to the GAPS rubric.	Generated up to 3 weeks after residents finished their plan generation.
Subject physician's self-confidence score 기간: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	After submitting each case plan, the participating physicians self-rated their confidence in their own plan using a 1-5 point Likert scale.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
Tool satisfaction score 기간: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	After submitting each case plan, the participating physicians rated their satisfaction with the tool using a 1-5 point Likert scale.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
Tool trustworthiness score 기간: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	After submitting each case plan, the participating physicians rated the tool's credibility using a 1-5 point Likert scale.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
Decision-making time 기간: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	The time taken (in minutes) by each participating physician to complete the production of each case plan was automatically recorded by the evaluation platform. Differences between groups were analyzed using a linear mixed-effects model.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.

공동 작업자 및 조사자

여기에서 이 연구와 관련된 사람과 조직을 찾을 수 있습니다.

스폰서

Peking University People's Hospital

연구 기록 날짜

이 날짜는 ClinicalTrials.gov에 대한 연구 기록 및 요약 결과 제출의 진행 상황을 추적합니다. 연구 기록 및 보고된 결과는 공개 웹사이트에 게시되기 전에 특정 품질 관리 기준을 충족하는지 확인하기 위해 국립 의학 도서관(NLM)에서 검토합니다.

연구 주요 날짜

연구 시작 (실제)

2026년 6월 10일

기본 완료 (추정된)

2026년 6월 21일

연구 완료 (추정된)

2026년 6월 21일

연구 등록 날짜

최초 제출

2026년 6월 10일

QC 기준을 충족하는 최초 제출

2026년 6월 13일

처음 게시됨 (실제)

2026년 6월 17일

연구 기록 업데이트

마지막 업데이트 게시됨 (실제)

2026년 6월 17일

QC 기준을 충족하는 마지막 업데이트 제출

2026년 6월 13일

마지막으로 확인됨

2026년 6월 1일

추가 정보

이 연구와 관련된 용어

키워드

추가 관련 MeSH 약관

기타 연구 ID 번호

2026PHB458-001

개별 참가자 데이터(IPD) 계획

개별 참가자 데이터(IPD)를 공유할 계획입니까?

아니요

약물 및 장치 정보, 연구 문서

미국 FDA 규제 의약품 연구

아니

미국 FDA 규제 기기 제품 연구

아니

이 정보는 변경 없이 clinicaltrials.gov 웹사이트에서 직접 가져온 것입니다. 귀하의 연구 세부 정보를 변경, 제거 또는 업데이트하도록 요청하는 경우 register@clinicaltrials.gov. 문의하십시오. 변경 사항이 clinicaltrials.gov에 구현되는 즉시 저희 웹사이트에도 자동으로 업데이트됩니다. .

폐암(NSCLC)에 대한 임상 시험

Georgetown University
National Cancer Institute (NCI); American Cancer Society, Inc.; Susan G. Komen Breast Cancer...

완전한

중국 여성 및 유방조영술 검진

American Cancer Society 유방조영술 검진 지침을 준수하지 않은 중국 여성 연구

미국
University of Utah
National Cancer Institute (NCI)

완전한

전이성 전립선암 환자를 위한 저항 운동 +/- 크레아틴

피로 | 좌식 생활 | 전이성 전립선암 | IV기 전립선암 AJCC(American Joint Committee on Cancer) v8 | IVA기 전립선암 AJCC(American Joint Committee on Cancer) v8 | IVB기 전립선암 AJCC(American Joint Committee on Cancer) v8

미국
Novartis Pharmaceuticals

완전한

진행성 신경내분비 종양(GI 또는 폐 기원) 환자의 치료에서 Everolimus 플러스 최고의 지지 요법 대 위약 플러스 최고의 지지 요법 (RADIANT-4)

신경내분비종양 | GI 오리진의 고급 NET | 고급 NET of Lung Origin

미국, 콜롬비아, 이탈리아, 대만, 영국, 벨기에, 체코, 독일, 일본, 사우디 아라비아, 캐나다, 네덜란드, 스페인, 대한민국, 레바논, 오스트리아, 중국, 그리스, 남아프리카, 태국, 헝가리, 칠면조, 폴란드, 슬로바키아, 러시아 연방
SB Istanbul Education and Research Hospital

아직 모집하지 않음

새로운 수술 중 신경 모니터링 장치 (IONM)

Thryoid cancer | parathyrıoid 선종
Xiaorong Dong

알려지지 않은

폐암 발병에 대한 미생물총의 역할

건강한 과목 | NSCLC 4기 | NSCLC, 3기 | NSCLC, 1기 | NSCLC, 2기

중국
Hunan Province Tumor Hospital

아직 모집하지 않음

The Efficacy and Safety of Trastuzumab Deruxtecan in Advanced or Metastatic NSCLC With HER2 Over Expression

NSCLC
Wen-zhao ZHONG

모병

Sub-lobectomy vs Lobectomy in IIA-IIIB NSCLC After Neoadjuvant IO+Chemo

NSCLC

중국
CSPC Megalith Biopharmaceutical Co.,Ltd.

아직 모집하지 않음

국소 진행성 또는 전이성 비소세포폐암 환자를 대상으로 한 SYS6010과 오시머티닙 병용요법의 제Ⅰb/Ⅲ상 임상시험 (SYNSTAR-02)

NSCLC
Tianjin Medical University Cancer Institute and...

모병

TALENT 연구: 신보조 화학면역요법 후 pCR이 없는 절제 가능한 비소세포폐암에서 L-TIL과 Tislelizumab의 병용 보조요법에 대한 2상 시험

NSCLC

중국
Shanghai Chest Hospital

아직 모집하지 않음

수술 가능한 HER2 변이 비소세포폐암에 대한 신보조 요법으로 SHR-A1811과 아데벨리무맙 병용 요법 연구

NSCLC

GAPS-Agent에 대한 임상 시험

University of Pittsburgh
Boston University; National Heart, Lung, and Blood Institute (NHLBI); Northeastern University

완전한

펜실베이니아주 피츠버그의 심방 세동 건강 문해력 및 정보 기술 시험 (AFibLITT)

병리학적 과정 | 심장 질환 | 심방세동 | 부정맥, 심장 | 가족성 심방세동

미국
ImmunityBio, Inc.

빼는

중증 지역사회 획득 폐렴을 앓고 있는 중증 성인 환자를 대상으로 한 노가펜데킨 알파 인바키셉트와 iNKT 세포 연구

부패 | 림프구 감소증 | 급성호흡곤란증후군(ARDS) | 지역사회획득폐렴(CAP) | 면역마비
Orchestra BioMed, Inc

모병

[미국 FDA에 의해 승인되거나 청산되지 않은 장치 시험]

관상동맥 질환

미국
Weill Medical College of Cornell University

완전한

사회기술집단치료 시범사업(비밀요원학회 프로그램)

불안 | 주의력 결핍 과잉 행동 장애(ADHD) | 자폐 스펙트럼 장애(ASD)

미국
Darren Sigal, MD
Scripps Health

아직 모집하지 않음

BAL/BOT/agenT-797 pMMR CRC 및 간 전이

대장암 전이성

미국
University of Turin, Italy

완전한

수근단미세수술 후 출혈 조절, 통증 및 삶의 질에 대한 다양한 지혈제의 효과

치근단 질환 | 신경 론적 감염

이탈리아
ImmunityBio, Inc.

빼는

중증 지역사회 획득 폐렴(패혈증/ARDS 포함 여부)으로 중환자실에 입원한 성인을 위한 노가펜데킨 알파-인바키셉트 및 iNKT 세포

부패 | 급성 호흡기 장애 증후군 | 중증 지역사회 획득성 폐렴 | 중증 성인 환자의 림프구 감소증 / 면역 마비
University of Roma La Sapienza

완전한

장기 상아질 과민증 치료를 위한 본딩과 불소 바니쉬의 효과.

상아질 과민증을 줄이기 위해

이탈리아
Aydin Adnan Menderes University

완전한

설측고정형, Hawley형, 진공형 유지장치를 착용한 환자에서 타액의 Bisphenol-A 수치 비교 평가

치열 교정 유지

칠면조

Preliminary Evaluation of a Large Language Model-Based Tool for Complex Surgical Decision Support in Lung Cancer

연구 개요

상태

정황

개입 / 치료

연구 유형

등록 (추정된)

단계

연락처 및 위치

연구 장소

참여기준

자격 기준

공부할 수 있는 나이

건강한 자원 봉사자를 받아들입니다

설명

공부 계획

연구는 어떻게 설계됩니까?

디자인 세부사항

팔의 수

무기와 개입

참가자 그룹 / 팔

개입 / 치료

연구는 무엇을 측정합니까?

주요 결과 측정

결과 측정

측정값 설명

기간

2차 결과 측정

결과 측정

측정값 설명

기간

공동 작업자 및 조사자

스폰서

연구 기록 날짜

연구 주요 날짜

연구 시작 (실제)

기본 완료 (추정된)

연구 완료 (추정된)

연구 등록 날짜

최초 제출

QC 기준을 충족하는 최초 제출

처음 게시됨 (실제)

연구 기록 업데이트

마지막 업데이트 게시됨 (실제)

QC 기준을 충족하는 마지막 업데이트 제출

마지막으로 확인됨

추가 정보

이 연구와 관련된 용어

키워드

추가 관련 MeSH 약관

기타 연구 ID 번호

개별 참가자 데이터(IPD) 계획

개별 참가자 데이터(IPD)를 공유할 계획입니까?

약물 및 장치 정보, 연구 문서

미국 FDA 규제 의약품 연구

미국 FDA 규제 기기 제품 연구

폐암(NSCLC)에 대한 임상 시험

GAPS-Agent에 대한 임상 시험

유사한 임상시험 검색

스폰서 및 공동 작업자

건강 상태

약물 개입

CROs by country

CROs in Liberia

정황

희귀 질병

약물 개입

식이 보충제

스폰서 / 협력자

위치