このページは自動翻訳されたものであり、翻訳の正確性は保証されていません。を参照してください。英語版ソーステキスト用。

Preliminary Evaluation of a Large Language Model-Based Tool for Complex Surgical Decision Support in Lung Cancer

2026年6月13日更新者：XiuYuan Chen、Peking University People's Hospital

This study is an exploratory effect-size estimation study, with the following specific objectives: ① to estimate the point estimate and 95% confidence interval of the Win Ratio for the experimental group (GAPS-Agent) versus the control group (large language model) in blinded pairwise preference judgments by thoracic surgery expert adjudicators, to serve as a sample size planning parameter for subsequent multicenter confirmatory clinical trials; ② to preliminarily evaluate the value of GAPS-Agent within clinical workflows.The hypothesis of this study is as follows: compared with a general-purpose large language model without medical enhancement (control group), a structured agentic workflow optimized on the basis of the GAPS evaluation framework (GAPS-Agent, experimental group) can help junior resident physicians generate clinical decision plans for complex lung cancer cases that are more strongly preferred by senior thoracic surgery expert adjudicators.

調査の概要

状態

招待による登録

条件

介入・治療

研究の種類

介入

入学 (推定)

段階

適用できない

連絡先と場所

このセクションには、調査を実施する担当者の連絡先の詳細と、この調査が実施されている場所に関する情報が記載されています。

研究場所

中国
- Beijing Municipality
  - Beijing、Beijing Municipality、中国、100044
    - Peking University People's Hospital

参加基準

研究者は、適格基準と呼ばれる特定の説明に適合する人を探します。これらの基準のいくつかの例は、人の一般的な健康状態または以前の治療です。

適格基準

就学可能な年齢

大人
高齢者

健康ボランティアの受け入れ

いいえ

説明

Inclusion Criteria:

Resident Physician Subjects:
1. Holds a valid and legally effective Physician Practice License of the People's Republic of China;
2. Currently holds the rank of resident physician in a thoracic surgery department at a tertiary Class A (3A) hospital;
3. Agrees to complete all assessment tasks of the main study phase in accordance with the study protocol;
4. Can guarantee the time and effort required to complete all assessment tasks of the main study.
Study Cases:
1. The case was discussed at the Thoracic Oncology Multidisciplinary Team (MDT) conference of Peking University People's Hospital between January 2025 and May 2026;
2. The current version of the NCCN guidelines does not provide an explicit recommendation covering the management of the case;
3. Does not overlap with the GAPS evaluation set;
4. The case is presented in pure text in a structured format, with all direct and indirect identifiers removed and complete de-identification performed prior to inclusion;
5. From the pool of eligible cases, 12 cases will be randomly drawn using Python (numpy.random, with a fixed and archived seed) to serve as the main study cases. The cases will cover 6 themes (chest mass of undetermined diagnosis, early-stage lung cancer, locally advanced lung cancer, oligometastatic/oligoprogressive disease, special intraoperative situations, and tumor recurrence), with 2 cases per theme.
Adjudication Expert Panel:
1. Holds a valid and legally effective Physician Practice License of the People's Republic of China;
2. Currently holds the rank of attending physician or above in a thoracic surgery department at a tertiary Class A hospital;
3. Chairs or regularly participates in lung cancer multidisciplinary team (MDT) work in their department.

Exclusion Criteria:

Resident Physician Subjects:
1. Has previously participated in the construction of the GAPS evaluation set or the development of GAPS-Agent;
2. Unable to complete the tasks of the study phase.
Study Cases:
1. Key case information is missing, such as text-form data on pathology (including IHC/NGS), imaging, laboratory tests, prior medical history, comorbidities, or PS score;
2. Decision-making for the case is strictly dependent on non-text information.
Adjudication Expert Panel:
1. Participated in the construction of the GAPS evaluation set, the content validity verification, or the development of GAPS-Agent for this study;
2. Has a direct conflict of interest with any specific product among the two-arm tools of this study.

研究計画

このセクションでは、研究がどのように設計され、研究が何を測定しているかなど、研究計画の詳細を提供します。

研究はどのように設計されていますか？

デザインの詳細

主な目的：他の
割り当て：ランダム化
介入モデル：並列代入
マスキング：独身

アーム数

武器と介入

参加者グループ / アーム	介入・治療
実験的：test arm GAPS-Agent	他の：GAPS-Agent The research group has previously developed the GAPS evaluation framework for complex clinical decision-making in lung cancer. In this framework, G (Grounding) characterizes the cognitive depth of decision-making (ranging from knowledge retrieval to decisions that go beyond clinical guidelines), A (Authority) corresponds to the grading of evidence strength, P (Perturbation) describes the identification and management of real-world clinical confounding factors, and S (Strength) corresponds to the calibration of recommendation strength. Within this framework, the research group has completed the construction of a 100-item complex lung cancer decision-making evaluation set along with its corresponding rubrics, and has invited multiple thoracic oncology experts to complete content validity validation. Based on this, the research group developed GAPS-Agent, which uses an open-source large language model as its foundation and integrates functional modules such as guideline and evidence retri
アクティブコンパレータ：control arm LLM	他の：LLM Open source large language model that is not specifically enhanced in medical field.

参加者グループ / アーム

介入・治療

実験的：test arm

GAPS-Agent

他の：GAPS-Agent

The research group has previously developed the GAPS evaluation framework for complex clinical decision-making in lung cancer. In this framework, G (Grounding) characterizes the cognitive depth of decision-making (ranging from knowledge retrieval to decisions that go beyond clinical guidelines), A (Authority) corresponds to the grading of evidence strength, P (Perturbation) describes the identification and management of real-world clinical confounding factors, and S (Strength) corresponds to the calibration of recommendation strength. Within this framework, the research group has completed the construction of a 100-item complex lung cancer decision-making evaluation set along with its corresponding rubrics, and has invited multiple thoracic oncology experts to complete content validity validation. Based on this, the research group developed GAPS-Agent, which uses an open-source large language model as its foundation and integrates functional modules such as guideline and evidence retri

アクティブコンパレータ：control arm

LLM

他の：LLM

Open source large language model that is not specifically enhanced in medical field.

この研究は何を測定していますか？

主要な結果の測定

結果測定	メジャーの説明	時間枠
Overall plan Win Ratio 時間枠：Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.

二次結果の測定

結果測定	メジャーの説明	時間枠
Inter-rater agreement 時間枠：Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	For the ternary preference judgment results of 10 expert judges across 192 paired comparisons and 6 evaluation domains, Fleiss' kappa was used to assess inter-rater agreement. The kappa value and its 95% confidence interval are reported for each evaluation domain.	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Redundancy Win Ratio 時間枠：Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Evidence-based medicine adherence Win Ratio 時間枠：Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Actionability Win Ratio 時間枠：Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Completeness Win Ratio 時間枠：Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Safety Win Ratio 時間枠：Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
GAPS automated rubric score 時間枠：Generated up to 3 weeks after residents finished their plan generation.	A third-party large language model, independent of the two study arms' base models, served as the judge model and automatically scored all 96 plans according to the GAPS rubric.	Generated up to 3 weeks after residents finished their plan generation.
Subject physician's self-confidence score 時間枠：Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	After submitting each case plan, the participating physicians self-rated their confidence in their own plan using a 1-5 point Likert scale.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
Tool satisfaction score 時間枠：Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	After submitting each case plan, the participating physicians rated their satisfaction with the tool using a 1-5 point Likert scale.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
Tool trustworthiness score 時間枠：Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	After submitting each case plan, the participating physicians rated the tool's credibility using a 1-5 point Likert scale.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
Decision-making time 時間枠：Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	The time taken (in minutes) by each participating physician to complete the production of each case plan was automatically recorded by the evaluation platform. Differences between groups were analyzed using a linear mixed-effects model.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.

協力者と研究者

ここでは、この調査に関係する人々や組織を見つけることができます。

スポンサー

Peking University People's Hospital

研究記録日

これらの日付は、ClinicalTrials.gov への研究記録と要約結果の提出の進捗状況を追跡します。研究記録と報告された結果は、国立医学図書館 (NLM) によって審査され、公開 Web サイトに掲載される前に、特定の品質管理基準を満たしていることが確認されます。

主要日程の研究

研究開始 (実際)

2026年6月10日

一次修了 (推定)

2026年6月21日

研究の完了 (推定)

2026年6月21日

試験登録日

最初に提出

2026年6月10日

QC基準を満たした最初の提出物

2026年6月13日

最初の投稿 (実際)

2026年6月17日

学習記録の更新

投稿された最後の更新 (実際)

2026年6月17日

QC基準を満たした最後の更新が送信されました

2026年6月13日

最終確認日

2026年6月1日

詳しくは

本研究に関する用語

キーワード

追加の関連 MeSH 用語

その他の研究ID番号

2026PHB458-001

個々の参加者データ (IPD) の計画

個々の参加者データ (IPD) を共有する予定はありますか?

いいえ

医薬品およびデバイス情報、研究文書

米国FDA規制医薬品の研究

いいえ

米国FDA規制機器製品の研究

いいえ

この情報は、Web サイト clinicaltrials.gov から変更なしで直接取得したものです。研究の詳細を変更、削除、または更新するリクエストがある場合は、register@clinicaltrials.gov。までご連絡ください。 clinicaltrials.gov に変更が加えられるとすぐに、ウェブサイトでも自動的に更新されます。

肺がん（NSCLC）の臨床試験

Novartis Pharmaceuticals

終了しました

進行性固形腫瘍の成人患者におけるTNO155の用量設定研究

メラノーマ | 高度なEGFR変異体非小さな細胞肺cancer（NSCLC） | KRAS G12変異NSCLC | 食道扁平上皮がん（SCC） | ヘッド/ネックSCC | 進行した胃腸間質腫瘍（GIST） | 進行したNRAS/BRAFT WT皮膚黒色腫

アメリカ, 台湾, オランダ, カナダ, スペイン, シンガポール, イタリア, 日本, 韓国
Xiaorong Dong

わからない

肺がんの発生における微生物叢の役割

健常者 | NSCLC IV期 | NSCLC、ステージ III | NSCLC、ステージ I | NSCLC、ステージ II

中国
Hunan Province Tumor Hospital

まだ募集していません

The Efficacy and Safety of Trastuzumab Deruxtecan in Advanced or Metastatic NSCLC With HER2 Over Expression

NSCLC
Wen-zhao ZHONG

募集

Sub-lobectomy vs Lobectomy in IIA-IIIB NSCLC After Neoadjuvant IO+Chemo

NSCLC

中国
CSPC Megalith Biopharmaceutical Co.,Ltd.

まだ募集していません

局所進行性または転移性非小細胞肺癌患者におけるSYS6010とオシメルチニブ併用の第Ⅰb／Ⅲ相臨床試験 (SYNSTAR-02)

NSCLC
Tianjin Medical University Cancer Institute and...

募集

TALENT試験：新補助化学免疫療法後pCRに至らなかった切除可能NSCLCにおけるアジュバントL-TILプラスティセリリマブの第II相試験

NSCLC

中国
Shanghai Chest Hospital

まだ募集していません

切除可能なHER2変異非小細胞肺癌に対する術前療法としてのSHR-A1811とアデベリムマブ併用療法に関する研究

NSCLC
Jiangsu Province Nanjing Brain Hospital

募集

脳脊髄液ctDNAの動的モニタリング

NSCLC

中国
Radboud University Medical Center
Pfizer; ImaginAb, Inc.; University Hospital Tuebingen

まだ募集していません

免疫ペット画像応答は、赤免疫チェックポイント阻害剤を投与します (IMPRINT)

NSCLC

ドイツ, オランダ
Guangdong Provincial People's Hospital

積極的、募集していない

非小細胞肺癌患者における術前免疫療法後のコルチゾール値変化とその予後価値に関する前向き観察研究

NSCLC

中国

GAPS-Agentの臨床試験

ImmunityBio, Inc.

引きこもった

重症市中肺炎の重症成人患者におけるノガペンデキンアルファインバキセプトとiNKT細胞に関する研究

敗血症 | リンパ球減少症 | 急性呼吸窮迫症候群（ARDS） | 市中肺炎 (CAP) | 免疫麻痺
Orchestra BioMed, Inc

募集

[米国FDAによって承認またはクリアされていないデバイスの試験]

冠動脈疾患

アメリカ
Darren Sigal, MD
Scripps Health

まだ募集していません

BAL/BOT/agenT-797 を用いたpMMR CRC肝転移治療

大腸がん転移性

アメリカ
ImmunityBio, Inc.

引きこもった

重症市中肺炎（敗血症／ARDSの有無を問わず）を有する重症成人患者に対するノガペンデキンアルファ-インバキセプトとiNKT細胞

敗血症 | 急性呼吸促拍症候群 | 重度市中肺炎 | 重症成人患者におけるリンパ球減少症／免疫麻痺
Aydin Adnan Menderes University

完了

固定リンガル、ホーリー、および真空成形リテーナーを使用した患者の唾液ビスフェノール A レベルの比較評価

矯正保持

七面鳥

Preliminary Evaluation of a Large Language Model-Based Tool for Complex Surgical Decision Support in Lung Cancer

調査の概要

状態

条件

介入・治療

研究の種類

入学 (推定)

段階

連絡先と場所

研究場所

参加基準

適格基準

就学可能な年齢

健康ボランティアの受け入れ

説明

研究計画

研究はどのように設計されていますか？

デザインの詳細

アーム数

武器と介入

参加者グループ / アーム

介入・治療

この研究は何を測定していますか？

主要な結果の測定

結果測定

メジャーの説明

時間枠

二次結果の測定

結果測定

メジャーの説明

時間枠

協力者と研究者

スポンサー

研究記録日

主要日程の研究

研究開始 (実際)

一次修了 (推定)

研究の完了 (推定)

試験登録日

最初に提出

QC基準を満たした最初の提出物

最初の投稿 (実際)

学習記録の更新

投稿された最後の更新 (実際)

QC基準を満たした最後の更新が送信されました

最終確認日

詳しくは

本研究に関する用語

キーワード

追加の関連 MeSH 用語

その他の研究ID番号

個々の参加者データ (IPD) の計画

個々の参加者データ (IPD) を共有する予定はありますか?

医薬品およびデバイス情報、研究文書

米国FDA規制医薬品の研究

米国FDA規制機器製品の研究

肺がん（NSCLC）の臨床試験

GAPS-Agentの臨床試験

類似の治験を検索

スポンサーと協力者

医学的状態

薬物療法

CROs by country

CROs in Netherlands

条件

まれな病気

薬物療法

ダイエットサプリメント

スポンサー/協力者

場所