Esta página se tradujo automáticamente y no se garantiza la precisión de la traducción. por favor refiérase a versión inglesa para un texto fuente.

Preliminary Evaluation of a Large Language Model-Based Tool for Complex Surgical Decision Support in Lung Cancer

13 de junio de 2026 actualizado por: XiuYuan Chen, Peking University People's Hospital

This study is an exploratory effect-size estimation study, with the following specific objectives: ① to estimate the point estimate and 95% confidence interval of the Win Ratio for the experimental group (GAPS-Agent) versus the control group (large language model) in blinded pairwise preference judgments by thoracic surgery expert adjudicators, to serve as a sample size planning parameter for subsequent multicenter confirmatory clinical trials; ② to preliminarily evaluate the value of GAPS-Agent within clinical workflows.The hypothesis of this study is as follows: compared with a general-purpose large language model without medical enhancement (control group), a structured agentic workflow optimized on the basis of the GAPS evaluation framework (GAPS-Agent, experimental group) can help junior resident physicians generate clinical decision plans for complex lung cancer cases that are more strongly preferred by senior thoracic surgery expert adjudicators.

Descripción general del estudio

Estado

Inscripción por invitación

Condiciones

Intervención / Tratamiento

Tipo de estudio

Intervencionista

Inscripción (Estimado)

Fase

No aplica

Contactos y Ubicaciones

Esta sección proporciona los datos de contacto de quienes realizan el estudio e información sobre dónde se lleva a cabo este estudio.

Ubicaciones de estudio

Porcelana
- Beijing Municipality
  - Beijing, Beijing Municipality, Porcelana, 100044
    - Peking University People's Hospital

Criterios de participación

Los investigadores buscan personas que se ajusten a una determinada descripción, denominada criterio de elegibilidad. Algunos ejemplos de estos criterios son el estado de salud general de una persona o tratamientos previos.

Criterio de elegibilidad

Edades elegibles para estudiar

Adulto
Adulto Mayor

Acepta Voluntarios Saludables

Descripción

Inclusion Criteria:

Resident Physician Subjects:
1. Holds a valid and legally effective Physician Practice License of the People's Republic of China;
2. Currently holds the rank of resident physician in a thoracic surgery department at a tertiary Class A (3A) hospital;
3. Agrees to complete all assessment tasks of the main study phase in accordance with the study protocol;
4. Can guarantee the time and effort required to complete all assessment tasks of the main study.
Study Cases:
1. The case was discussed at the Thoracic Oncology Multidisciplinary Team (MDT) conference of Peking University People's Hospital between January 2025 and May 2026;
2. The current version of the NCCN guidelines does not provide an explicit recommendation covering the management of the case;
3. Does not overlap with the GAPS evaluation set;
4. The case is presented in pure text in a structured format, with all direct and indirect identifiers removed and complete de-identification performed prior to inclusion;
5. From the pool of eligible cases, 12 cases will be randomly drawn using Python (numpy.random, with a fixed and archived seed) to serve as the main study cases. The cases will cover 6 themes (chest mass of undetermined diagnosis, early-stage lung cancer, locally advanced lung cancer, oligometastatic/oligoprogressive disease, special intraoperative situations, and tumor recurrence), with 2 cases per theme.
Adjudication Expert Panel:
1. Holds a valid and legally effective Physician Practice License of the People's Republic of China;
2. Currently holds the rank of attending physician or above in a thoracic surgery department at a tertiary Class A hospital;
3. Chairs or regularly participates in lung cancer multidisciplinary team (MDT) work in their department.

Exclusion Criteria:

Resident Physician Subjects:
1. Has previously participated in the construction of the GAPS evaluation set or the development of GAPS-Agent;
2. Unable to complete the tasks of the study phase.
Study Cases:
1. Key case information is missing, such as text-form data on pathology (including IHC/NGS), imaging, laboratory tests, prior medical history, comorbidities, or PS score;
2. Decision-making for the case is strictly dependent on non-text information.
Adjudication Expert Panel:
1. Participated in the construction of the GAPS evaluation set, the content validity verification, or the development of GAPS-Agent for this study;
2. Has a direct conflict of interest with any specific product among the two-arm tools of this study.

Plan de estudios

Esta sección proporciona detalles del plan de estudio, incluido cómo está diseñado el estudio y qué mide el estudio.

¿Cómo está diseñado el estudio?

Detalles de diseño

Propósito principal: Otro
Asignación: Aleatorizado
Modelo Intervencionista: Asignación paralela
Enmascaramiento: Único

Número de brazos

Armas e Intervenciones

Grupo de participantes/brazo	Intervención / Tratamiento
Experimental: test arm GAPS-Agent	Otro: GAPS-Agent The research group has previously developed the GAPS evaluation framework for complex clinical decision-making in lung cancer. In this framework, G (Grounding) characterizes the cognitive depth of decision-making (ranging from knowledge retrieval to decisions that go beyond clinical guidelines), A (Authority) corresponds to the grading of evidence strength, P (Perturbation) describes the identification and management of real-world clinical confounding factors, and S (Strength) corresponds to the calibration of recommendation strength. Within this framework, the research group has completed the construction of a 100-item complex lung cancer decision-making evaluation set along with its corresponding rubrics, and has invited multiple thoracic oncology experts to complete content validity validation. Based on this, the research group developed GAPS-Agent, which uses an open-source large language model as its foundation and integrates functional modules such as guideline and evidence retri
Comparador activo: control arm LLM	Otro: LLM Open source large language model that is not specifically enhanced in medical field.

Grupo de participantes/brazo

Intervención / Tratamiento

Experimental: test arm

GAPS-Agent

Otro: GAPS-Agent

The research group has previously developed the GAPS evaluation framework for complex clinical decision-making in lung cancer. In this framework, G (Grounding) characterizes the cognitive depth of decision-making (ranging from knowledge retrieval to decisions that go beyond clinical guidelines), A (Authority) corresponds to the grading of evidence strength, P (Perturbation) describes the identification and management of real-world clinical confounding factors, and S (Strength) corresponds to the calibration of recommendation strength. Within this framework, the research group has completed the construction of a 100-item complex lung cancer decision-making evaluation set along with its corresponding rubrics, and has invited multiple thoracic oncology experts to complete content validity validation. Based on this, the research group developed GAPS-Agent, which uses an open-source large language model as its foundation and integrates functional modules such as guideline and evidence retri

Comparador activo: control arm

LLM

Otro: LLM

Open source large language model that is not specifically enhanced in medical field.

¿Qué mide el estudio?

Medidas de resultado primarias

Medida de resultado	Medida Descripción	Periodo de tiempo
Overall plan Win Ratio Periodo de tiempo: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.

Medidas de resultado secundarias

Medida de resultado	Medida Descripción	Periodo de tiempo
Inter-rater agreement Periodo de tiempo: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	For the ternary preference judgment results of 10 expert judges across 192 paired comparisons and 6 evaluation domains, Fleiss' kappa was used to assess inter-rater agreement. The kappa value and its 95% confidence interval are reported for each evaluation domain.	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Redundancy Win Ratio Periodo de tiempo: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Evidence-based medicine adherence Win Ratio Periodo de tiempo: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Actionability Win Ratio Periodo de tiempo: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Completeness Win Ratio Periodo de tiempo: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
Safety Win Ratio Periodo de tiempo: Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.	A total of 10 blinded expert judges made Win/Tie/Loss ternary preference judgments on 192 paired scheme comparisons in terms of overall scheme quality. The win ratio was calculated as Wins ÷ Losses, and the 95% confidence interval was estimated using a two-level (physician × case) cluster bootstrap resampling method (B = 10,000, quantile method on the log scale).	Measured at the time when experts completed their preference judgements. Calculated up to 3 weeks after the preference judgements.
GAPS automated rubric score Periodo de tiempo: Generated up to 3 weeks after residents finished their plan generation.	A third-party large language model, independent of the two study arms' base models, served as the judge model and automatically scored all 96 plans according to the GAPS rubric.	Generated up to 3 weeks after residents finished their plan generation.
Subject physician's self-confidence score Periodo de tiempo: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	After submitting each case plan, the participating physicians self-rated their confidence in their own plan using a 1-5 point Likert scale.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
Tool satisfaction score Periodo de tiempo: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	After submitting each case plan, the participating physicians rated their satisfaction with the tool using a 1-5 point Likert scale.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
Tool trustworthiness score Periodo de tiempo: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	After submitting each case plan, the participating physicians rated the tool's credibility using a 1-5 point Likert scale.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.
Decision-making time Periodo de tiempo: Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.	The time taken (in minutes) by each participating physician to complete the production of each case plan was automatically recorded by the evaluation platform. Differences between groups were analyzed using a linear mixed-effects model.	Completed at the time when residents submitted their plans. Calculated up to 3 weeks after the submission.

Colaboradores e Investigadores

Aquí es donde encontrará personas y organizaciones involucradas en este estudio.

Patrocinador

Peking University People's Hospital

Fechas de registro del estudio

Estas fechas rastrean el progreso del registro del estudio y los envíos de resultados resumidos a ClinicalTrials.gov. Los registros del estudio y los resultados informados son revisados por la Biblioteca Nacional de Medicina (NLM) para asegurarse de que cumplan con los estándares de control de calidad específicos antes de publicarlos en el sitio web público.

Fechas importantes del estudio

Inicio del estudio (Actual)

10 de junio de 2026

Finalización primaria (Estimado)

21 de junio de 2026

Finalización del estudio (Estimado)

21 de junio de 2026

Fechas de registro del estudio

Enviado por primera vez

10 de junio de 2026

Primero enviado que cumplió con los criterios de control de calidad

13 de junio de 2026

Publicado por primera vez (Actual)

17 de junio de 2026

Actualizaciones de registros de estudio

Última actualización publicada (Actual)

17 de junio de 2026

Última actualización enviada que cumplió con los criterios de control de calidad

13 de junio de 2026

Última verificación

1 de junio de 2026

Más información

Términos relacionados con este estudio

Palabras clave

Términos MeSH relevantes adicionales

Otros números de identificación del estudio

2026PHB458-001

Plan de datos de participantes individuales (IPD)

¿Planea compartir datos de participantes individuales (IPD)?

Información sobre medicamentos y dispositivos, documentos del estudio

Estudia un producto farmacéutico regulado por la FDA de EE. UU.

Estudia un producto de dispositivo regulado por la FDA de EE. UU.

Esta información se obtuvo directamente del sitio web clinicaltrials.gov sin cambios. Si tiene alguna solicitud para cambiar, eliminar o actualizar los detalles de su estudio, comuníquese con register@clinicaltrials.gov. Tan pronto como se implemente un cambio en clinicaltrials.gov, también se actualizará automáticamente en nuestro sitio web. .

Ensayos clínicos sobre Cáncer de pulmón (NSCLC)

Jianxing He
Innovent Biologics (Suzhou) Co. Ltd.

Reclutamiento

Terapia neoadyuvante secuencial con Fulzerasib Sintilimab más doblete de platino para NSCLC resecable con mutación KRAS G12C (K-NADIR)

Terapia neoadyuvante | Mutación KRAS G12C | NSCLC resecable | NSCLC en estadio IB-IIIA

Porcelana
Second Affiliated Hospital, School of Medicine,...
Ningbo Medical Center Lihuili Hospital; Ruijin Hospital; The First Affiliated Hospital... y otros colaboradores

Reclutamiento

SKB264 Plus Glecirasib in Advanced KRAS G12C-Mutant NSCLC: A Phase II Study

NSCLC

Porcelana
Tianjin Medical University Cancer Institute and...

Aún no reclutando

Relatrilfa-α Plus Quimioterapia Monoagente como Terapia Neoadyuvante para NSCLC en Estadio II-III Resecable (Fase 2)

NSCLC
Tianjin Medical University Cancer Institute and...

Aún no reclutando

Toripalimab combinado con quimioterapia basada en platino con o sin antagonista del receptor H1 en el tratamiento perioperatorio del cáncer de pulmón no microcítico resecable

NSCLC

Porcelana
Fondazione Ricerca Traslazionale

Reclutamiento

Un ensayo aleatorizado de fase II de Cemiplimab más OSE2101 (TEDOPI®) como terapia de mantenimiento en NSCL con ctDNA positivo. El estudio Cemited. (Cemited)

NSCLC

Italia
Peking Union Medical College

Reclutamiento

Sugemalimab-Quimioterapia de Primera Línea en el Mundo Real en Cáncer de Pulmón de Células no Pequeñas Avanzado

NSCLC

Porcelana
Tianjin Medical University Cancer Institute and...

Aún no reclutando

Análisis de ácido desoxirribonucleico y ácido ribonucleico Secuenciación de próxima generación en pacientes con cáncer de pulmón de células no pequeñas sin respuesta completa patológica después de inmunoterapia neoadyuvante

NSCLC

Porcelana
HC Biopharma Inc.

Reclutamiento

HC010 en pacientes con CPNC de primera línea PD-L1 Avanzados positivos

NSCLC

Porcelana
Xinqiao Hospital of Chongqing

Terminado

Investigación clínica para el análisis de consistencia de PD-L1 en tejido canceroso y exosoma plasmático (RadImm01)

NSCLC
Seoul St. Mary's Hospital
Boehringer Ingelheim

Activo, no reclutando

Estudio retrospectivo de múltiples cohortes de afatinib de primera línea seguido de terapia de segunda línea que incluye osimertinib, quimioterapia u otra terapia

NSCLC

Corea, república de

Ensayos clínicos sobre GAPS-Agent

Fox Chase Cancer Center

Terminado

VM110 en la detección de tumores microscópicos: un estudio de fase I

Cancer de pancreas | Cáncer de ovarios

Estados Unidos
ImmunityBio, Inc.

Retirado

Estudio de Nogapendekin Alfa Inbakicept y células iNKT en adultos críticamente enfermos con neumonía grave adquirida en la comunidad

Septicemia | Linfopenia | Síndrome de Dificultad Respiratoria Aguda (SDRA) | Neumonía Adquirida en la Comunidad (NAC) | Inmunoparálisis
Darren Sigal, MD
Scripps Health

Aún no reclutando

BAL/BOT/agenT-797 en CCR pMMR con metástasis hepáticas

Cáncer colorrectal metastásico

Estados Unidos
Orchestra BioMed, Inc

Reclutamiento

[Prueba de dispositivo que no es aprobado o autorizado por la FDA de EE. UU.]

Enfermedad de la arteria coronaria

Estados Unidos
ImmunityBio, Inc.

Retirado

Nogapendekin Alfa-Inbakicept y células iNKT para adultos críticamente enfermos con neumonía adquirida en la comunidad grave (con o sin sepsis/SDRA)

Septicemia | Síndrome de distrés respiratorio agudo | Neumonía Adquirida en la Comunidad Grave | Linfopenia / Inmunoparálisis en Adultos Críticamente Enfermos
H. Lee Moffitt Cancer Center and Research Institute
Servier

Reclutamiento

CalPeg para la leucemia linfoblástica aguda (LLA) recién diagnosticada

Leucemia linfoblástica aguda

Estados Unidos

Preliminary Evaluation of a Large Language Model-Based Tool for Complex Surgical Decision Support in Lung Cancer

Descripción general del estudio

Estado

Condiciones

Intervención / Tratamiento

Tipo de estudio

Inscripción (Estimado)

Fase

Contactos y Ubicaciones

Ubicaciones de estudio

Criterios de participación

Criterio de elegibilidad

Edades elegibles para estudiar

Acepta Voluntarios Saludables

Descripción

Plan de estudios

¿Cómo está diseñado el estudio?

Detalles de diseño

Número de brazos

Armas e Intervenciones

Grupo de participantes/brazo

Intervención / Tratamiento

¿Qué mide el estudio?

Medidas de resultado primarias

Medida de resultado

Medida Descripción

Periodo de tiempo

Medidas de resultado secundarias

Medida de resultado

Medida Descripción

Periodo de tiempo

Colaboradores e Investigadores

Patrocinador

Fechas de registro del estudio

Fechas importantes del estudio

Inicio del estudio (Actual)

Finalización primaria (Estimado)

Finalización del estudio (Estimado)

Fechas de registro del estudio

Enviado por primera vez

Primero enviado que cumplió con los criterios de control de calidad

Publicado por primera vez (Actual)

Actualizaciones de registros de estudio

Última actualización publicada (Actual)

Última actualización enviada que cumplió con los criterios de control de calidad

Última verificación

Más información

Términos relacionados con este estudio

Palabras clave

Términos MeSH relevantes adicionales

Otros números de identificación del estudio

Plan de datos de participantes individuales (IPD)

¿Planea compartir datos de participantes individuales (IPD)?

Información sobre medicamentos y dispositivos, documentos del estudio

Estudia un producto farmacéutico regulado por la FDA de EE. UU.

Estudia un producto de dispositivo regulado por la FDA de EE. UU.

Ensayos clínicos sobre Cáncer de pulmón (NSCLC)

Ensayos clínicos sobre GAPS-Agent

Buscar ensayos similares

Patrocinadores y Colaboradores

Condiciones médicas

Intervenciones de drogas

CROs by country

CROs in Liberia

Condiciones

Enfermedades Raras

Intervenciones de drogas

Suplementos dietéticos

Patrocinador / Colaboradores

Localizaciones