AI-Assisted Staging and Treatment Decision-Making for Hepatocellular Carcinoma

April 13, 2026 updated by: Jiahong Dong,MD, Beijing Tsinghua Chang Gung Hospital

A Prospective, Randomized, Controlled, Crossover Study of Artificial Intelligence-Assisted Multi-Dimensional Staging and Treatment Decision-Making for Hepatocellular Carcinoma

The precise treatment of primary hepatocellular carcinoma (HCC) highly depends on accurate disease staging (CNLC, TNM, BCLC) and scientific treatment decision-making, which necessitate the integration of both imaging and clinical baseline data. This study prospectively recruits HCC patients and clinical physicians across different hospital tiers to evaluate the clinical value of a self-developed artificial intelligence (AI) model in assisting multi-dimensional comprehensive assessment and treatment decision-making. Utilizing a Multi-Rater Multi-Case (MRMC) crossover balanced design, the study compares the accuracy of clinical evaluations performed by physicians under "unassisted (without AI)" versus "AI-assisted" conditions. A key focus is to explore whether AI can significantly enhance the comprehensive assessment capabilities of physicians in primary/secondary care hospitals, thereby prospectively reducing diagnostic and therapeutic heterogeneity across different institutional levels.

Study Overview

Detailed Description

  1. Study Description

    Brief Summary: The precise treatment of primary hepatocellular carcinoma (HCC) highly depends on accurate disease staging (CNLC, TNM, BCLC) and scientific treatment decision-making, which necessitate the integration of both imaging and clinical baseline data. This study prospectively recruits HCC patients and clinical physicians across different hospital tiers to evaluate the clinical value of a self-developed artificial intelligence (AI) model in assisting multi-dimensional comprehensive assessment and treatment decision-making. Utilizing a Multi-Rater Multi-Case (MRMC) crossover balanced design, the study compares the accuracy of clinical evaluations performed by physicians under "unassisted (without AI)" versus "AI-assisted" conditions. A key focus is to explore whether AI can significantly enhance the comprehensive assessment capabilities of physicians in primary/secondary care hospitals, thereby prospectively reducing diagnostic and therapeutic heterogeneity across different institutional levels.

    Gold Standard (Reference Standard): The reference standard (Ground Truth) for all prospectively enrolled cases is established by an independent expert panel consisting of 3 authoritative experts. The panel determines the final standard answers for the four classification tasks through blinded independent evaluation and joint discussion (voting system), incorporating complete prospective imaging data, clinical baseline data, multidisciplinary team (MDT) consensus, and final pathological or clinical follow-up results.

  2. Eligibility Criteria

2.1 Evaluator Eligibility:

  1. Senior Physicians in Tertiary Hospitals: Employed in the department of hepato-pancreato-biliary surgery, oncology, or related departments in Class III Grade A (tertiary) hospitals, with the professional title of attending physician or above.
  2. Junior Physicians in Tertiary Hospitals: Employed in related departments in Class III Grade A (tertiary) hospitals, with the professional title of resident physician.
  3. Physicians in Primary/Secondary Care Hospitals: Clinical physicians employed in county-level or Class II general hospitals.
  4. Informed Consent: Must voluntarily agree to participate in the assessment and sign the informed consent form.

2.2 Patient/Case Eligibility:

Inclusion Criteria:

  1. Age > 18 years.
  2. Patients prospectively presenting with suspected or newly diagnosed primary hepatocellular carcinoma (HCC) later confirmed by pathology or meeting the China Liver Cancer (CNLC) guidelines.
  3. Complete baseline clinical data acquired during the prospective enrollment period, including complete history of present/past illness, ECOG PS score, comprehensive laboratory tests (liver function, coagulation, tumor markers such as AFP, etc.), and baseline abdominal contrast-enhanced CT.
  4. Patients (or their legal representatives) must provide written informed consent for their clinical data to be used in this trial.

Exclusion Criteria:

  1. Patients with secondary (metastatic) liver cancer or concurrent severe malignancies of other systems.
  2. Patients who fail to complete the required baseline imaging or laboratory tests, preventing accurate staging calculation (e.g., missing data for Child-Pugh score).
  3. Patients who have previously received anti-tumor therapies for liver cancer prior to enrollment.

3. Study Design

Intervention Model: Crossover Assignment Masking: Single Blind. Participating evaluators are blinded to the gold standard answers of the cases and to the evaluation results of other participating physicians.

Arms and Interventions:

Case Set Partition: 108 prospectively and consecutively enrolled eligible HCC cases are batched and randomly divided into Dataset Set A (54 cases) and Dataset Set B (54 cases). It is ensured that there are no statistically significant differences between the two sets regarding tumor burden, liver function grading, and staging distribution.

Evaluator Grouping: A total of 12 prospectively recruited clinical physicians are included, comprising 4 in the tertiary hospital senior group, 4 in the tertiary hospital junior group, and 4 in the primary/secondary hospital group. They are divided into two evaluation groups based on stratified randomization:

Group A (6 evaluators): 2 tertiary senior, 2 tertiary junior, 2 primary/secondary hospital.

Group B (6 evaluators): 2 tertiary senior, 2 tertiary junior, 2 primary/secondary hospital.

Arm 1 - Group A Evaluators:

Phase 1 Intervention (Control): Independent evaluation of Set A (54 cases) combining clinical texts and imaging data, recording 4 classification results, without AI assistance.

Phase 2 Intervention (Experimental): Evaluation of Set B (54 cases). The system presents the AI model's 4 prediction results and related evidence; physicians provide the final judgment after comprehensive reference.

Arm 2 - Group B Evaluators:

Phase 1 Intervention (Control): Independent evaluation of Set B (54 cases) combining clinical texts and imaging data, recording 4 classification results, without AI assistance.

Phase 2 Intervention (Experimental): Evaluation of Set A (54 cases). Physicians provide the final judgment after referencing the AI model's results.

4. Outcome Measures

Primary Outcome:

Improvement in Overall Accuracy: The difference in average accuracy across the 4 classification tasks between AI-assisted evaluation (experimental group) and independent evaluation (control group).

Secondary Outcomes:

Homogenization Effect: Assessment of whether the difference in clinical evaluation accuracy between physicians in the primary/secondary hospital group and the tertiary hospital groups is significantly reduced under AI assistance.

Evaluation Efficiency: Comparison of the average evaluation time per case between physicians with and without AI assistance.

Inter-rater Agreement: Comparison of the consistency of evaluation results among physicians (e.g., using Kappa statistics), with and without AI assistance.

5. Statistical Analysis Plan & Sample Size

Sample Size Justification:

The sample size calculation for this study is based on the expected change in the overall average accuracy across all levels of prospectively recruited physicians. It is estimated that the overall average accuracy without AI assistance is 0.60, and with AI assistance is 0.70.

Setting the significance level for a two-sided test at 0.05 (corresponding to a Z-value of approximately 1.96) and the statistical power at 0.80 (corresponding to a Z-value of approximately 0.84), the sample size was determined using the standard statistical method for comparing two independent proportions. Assuming no clustering effect resulting from multiple case evaluations by the same physician, this calculation indicates that each intervention group requires at least 353 independent evaluations.

Power Verification:

In the actual configuration of this study, there are 12 physicians in total.

Total independent evaluations for the control group (without AI) = Group A (6 evaluators) x Set A (54 cases) + Group B (6 evaluators) x Set B (54 cases) = 648 independent evaluations.

Total independent evaluations for the experimental group (with AI) also = 648 independent evaluations.

Since 648 evaluations is greater than the required base of 353 evaluations, the current configuration of cases and physicians already possesses sufficient statistical power. This sample size provides a conservative margin (approximately 1.8 times the base requirement) to adequately account for any clustering effect (intra-class correlation) resulting from multiple case evaluations by the same physician in this MRMC design.

Study Type

Observational

Enrollment (Estimated)

108

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Contact

Study Locations

    • Changping
      • Beijing, Changping, China, 102218
        • Recruiting
        • Beijing Tsinghua Changgung Hospital
        • Contact:

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult
  • Older Adult

Accepts Healthy Volunteers

No

Sampling Method

Non-Probability Sample

Study Population

The study population comprises 108 prospectively and consecutively enrolled adult patients with newly diagnosed primary hepatocellular carcinoma (HCC). Following enrollment and confirmation of complete baseline clinical and imaging data, these 108 patient cases are randomly divided into two equal datasets: Set A (54 cases) and Set B (54 cases). Randomization is stratified to ensure no statistically significant differences between the two sets regarding baseline characteristics such as tumor burden, liver function grading, and staging distribution.

In the context of this multi-rater multi-case (MRMC) crossover design, these patient cases are allocated to distinct evaluation conditions. Set A cases are assigned to be evaluated by the first group of reviewing physicians without AI assistance (control condition) and by the second group of physicians with AI assistance (experimental condition). Conversely, Set B cases are evaluated by the second group of physicians without AI

Description

Inclusion Criteria:

  • Age >= 18 years.
  • Patients prospectively presenting with suspected or newly diagnosed primary hepatocellular carcinoma (HCC) later confirmed by pathology or meeting the China Liver Cancer (CNLC) guidelines.
  • Complete baseline clinical data acquired during the prospective enrollment period, including complete history of present/past illness, ECOG PS score, comprehensive laboratory tests (liver function, coagulation, tumor markers such as AFP, etc.), and baseline abdominal contrast-enhanced CT.
  • Patients (or their legal representatives) must provide written informed consent for their clinical data to be used in this trial.

Exclusion Criteria:

  • Patients with secondary (metastatic) liver cancer or concurrent severe malignancies of other systems.
  • Patients who fail to complete the required baseline imaging or laboratory tests, preventing accurate staging calculation (e.g., missing data for Child-Pugh score).
  • Patients who have previously received anti-tumor therapies for liver cancer prior to enrollment.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Cohorts and Interventions

Group / Cohort
Intervention / Treatment
Group A Evaluators
A prospectively recruited group of 6 physicians (2 tertiary senior, 2 tertiary junior, and 2 primary/secondary hospital physicians). In Phase 1 (Control), they independently evaluate HCC case Set A without AI assistance. In Phase 2 (Experimental), they evaluate case Set B with the assistance of the AI model.
Physicians independently evaluate the HCC cases and provide staging and treatment decisions using only complete clinical baseline data and imaging data, without any assistance from the AI model.
Physicians evaluate the HCC cases and provide final staging and treatment decisions after reviewing the initial predictions and related evidence generated by the self-developed artificial intelligence (AI) model, alongside the clinical baseline and imaging data.
Group B Evaluators
A prospectively recruited group of 6 physicians (2 tertiary senior, 2 tertiary junior, and 2 primary/secondary hospital physicians). In Phase 1 (Control), they independently evaluate HCC case Set B without AI assistance. In Phase 2 (Experimental), they evaluate case Set A with the assistance of the AI model.
Physicians independently evaluate the HCC cases and provide staging and treatment decisions using only complete clinical baseline data and imaging data, without any assistance from the AI model.
Physicians evaluate the HCC cases and provide final staging and treatment decisions after reviewing the initial predictions and related evidence generated by the self-developed artificial intelligence (AI) model, alongside the clinical baseline and imaging data.

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Improvement in Overall Accuracy
Time Frame: Up to 1 week (Assessed upon completion of all case evaluations)
The difference in average accuracy across the 4 classification tasks between AI-assisted evaluation (experimental) and independent evaluation (control). Accuracy is determined by comparing physicians' predictions against the reference standard (Ground Truth) established by the independent expert panel
Up to 1 week (Assessed upon completion of all case evaluations)

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Homogenization Effect on Evaluation Accuracy
Time Frame: Up to 1 week (Assessed upon completion of all case evaluations)
Assessment of whether the difference in clinical evaluation accuracy between physicians in the primary/secondary hospital group and the tertiary hospital groups is significantly reduced under AI assistance compared to unassisted independent evaluation.
Up to 1 week (Assessed upon completion of all case evaluations)
Evaluation Efficiency (Average Time per Case)
Time Frame: Up to 1 week (Assessed upon completion of all case evaluations)
Comparison of the average evaluation time (e.g., measured in minutes) per case required by participating physicians when utilizing AI assistance versus performing unassisted independent evaluation.
Up to 1 week (Assessed upon completion of all case evaluations)
Inter-rater Agreement
Time Frame: Up to 1 week (Assessed upon completion of all case evaluations)
Comparison of the consistency of evaluation results (staging and treatment decisions) among all participating physicians, assessed using appropriate statistical measures (e.g., Kappa statistics), under AI-assisted versus unassisted conditions.
Up to 1 week (Assessed upon completion of all case evaluations)

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Estimated)

April 13, 2026

Primary Completion (Estimated)

May 13, 2026

Study Completion (Estimated)

May 20, 2026

Study Registration Dates

First Submitted

April 13, 2026

First Submitted That Met QC Criteria

April 13, 2026

First Posted (Actual)

April 20, 2026

Study Record Updates

Last Update Posted (Actual)

April 20, 2026

Last Update Submitted That Met QC Criteria

April 13, 2026

Last Verified

April 1, 2026

More Information

Terms related to this study

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

UNDECIDED

IPD Plan Description

The sharing of individual participant data, particularly high-resolution medical imaging (CT scans) and clinical baseline data, is subject to strict institutional data security policies and national regulations regarding patient privacy. Therefore, a definitive plan for public data sharing is currently undecided. However, fully de-identified clinical data and AI model evaluation results may be made available upon reasonable request to researchers who provide a methodologically sound proposal. Any such sharing will be strictly subject to approval by the Institutional Review Board (IRB) of Beijing Tsinghua Changgung Hospital and the execution of a formal Data Sharing Agreement.

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Hepatocellular Carcinoma (HCC)

Clinical Trials on Unassisted Independent Evaluation

Subscribe