Sex-Specific Machine Learning Models to Predict Distant Metastasis in Liver Cancer (GENDER-HCC-MET)

January 29, 2026 updated by: Linlin Liu, Tongji University

Gender-Specific Prediction Models for Hepatocellular Carcinoma Metastasis: A Machine Learning-Based Retrospective Cohort Study

This study looked at whether male and female patients with liver cancer (hepatocellular carcinoma, HCC) have different risks of the cancer spreading to distant parts of the body (distant metastasis). Liver cancer is much more common in men than in women, and women often have better survival rates. However, it was unclear if the factors that predict this spread are the same for both sexes.

To answer this question, researchers analyzed information from a large, national cancer database (SEER) from 2004 to 2022, including 19,019 patients diagnosed with liver cancer. They studied factors like age, race, tumor stage, treatment received, and where patients lived. The team used advanced computer models (machine learning) to build separate prediction tools for men and women to estimate their risk of distant metastasis at the time of diagnosis.

Study Overview

Detailed Description

Study Design and Objective:

This was a retrospective, population-based cohort study utilizing data from the Surveillance, Epidemiology, and End Results (SEER) database. The primary objective was to systematically compare the incidence and identify sex-specific determinants of distant metastasis in patients with hepatocellular carcinoma (HCC). A secondary objective was to develop and validate separate, high-performance machine learning (ML) prediction models for distant metastasis risk tailored to male and female patients.

Data Source and Participants:

Data were extracted from 22 SEER registries covering patients diagnosed with HCC between 2004 and 2022. Inclusion required a pathological diagnosis of HCC. Key exclusion criteria were: missing data on race, marital status, tumor grade, or surgical status; non-first primary malignancy; and incomplete TNM staging data. After applying criteria, 19,019 patients were included in the final analysis (14,575 males, 4,444 females).

Variables and Definitions:

The outcome variable was distant metastasis status at diagnosis, dichotomized as M0 (no metastasis) or M1 (metastasis) based on consistent AJCC criteria. Predictor variables included: age, sex, race, tumor grade, marital status, surgical treatment (categorized as non-surgery, local therapy, surgical resection, or liver transplantation), radiotherapy, chemotherapy, annual household income, and residential location (based on population size). To ensure comparability across different editions of the AJCC staging manual, T stage was grouped as T0-2 (localized) vs. T3-4 (locally advanced), and N stage as N0 vs. N1.

Statistical and Machine Learning Analysis:

Univariate and multivariable logistic regression analyses were performed to identify factors independently associated with distant metastasis, stratified by sex.

For predictive modeling, the dataset was randomly split into a training set (80%) and an internal testing set (20%). Eight machine learning algorithms were developed and compared: Logistic Regression, Random Forest, XGBoost, LightGBM, AdaBoost, Decision Tree, Gradient Boosting Decision Tree (GBDT), and Multilayer Perceptron. Model hyperparameters were optimized using 10-fold cross-validation on the training set. The final models were evaluated on the independent testing set. Model performance was assessed using the Area Under the Receiver Operating Characteristic Curve (AUC), accuracy, sensitivity, specificity, F1 score, calibration curves, and Decision Curve Analysis (DCA). The interpretability of the best-performing model was enhanced using Shapley Additive Explanations (SHAP).

Study Type

Observational

Enrollment (Actual)

19019

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult
  • Older Adult

Accepts Healthy Volunteers

No

Sampling Method

Non-Probability Sample

Study Population

This study is a retrospective analysis of a pre-existing, de-identified population-based cancer registry (SEER). The study population consists of all adult patients meeting the above eligibility criteria who were diagnosed with hepatocellular carcinoma (HCC) within the specified period. Participants were not prospectively recruited for this study.

Description

Inclusion Criteria:

  • Pathologically confirmed diagnosis of Hepatocellular Carcinoma (HCC).
  • Diagnosis year between 2004 and 2022, inclusive.
  • Case identified within the 22 registries of the Surveillance, Epidemiology, and End Results (SEER) database.

Exclusion Criteria:

  • Missing information on race, marital status, tumor grade, or surgical status.
  • Non-first primary malignancy or presence of multiple primary tumors.
  • Incomplete TNM staging data.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Cohorts and Interventions

Group / Cohort
MALE and Female HCC Patients
A cohort of male patients (n=14,575) diagnosed with hepatocellular carcinoma (HCC) between 2004 and 2022, identified from the SEER database. This group was analyzed separately to identify sex-specific determinants and build a prediction model for distant metastasis.;A cohort of female patients (n=4,444) diagnosed with hepatocellular carcinoma (HCC) between 2004 and 2022, identified from the SEER database. This group was analyzed separately to identify sex-specific determinants and build a prediction model for distant metastasis.

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Presence of Distant Metastasis at Diagnosis
Time Frame: At completion of data analysis (2024).
The area under the receiver operating characteristic curve (AUC) of the best-performing machine learning model for predicting distant metastasis in the male cohort, evaluated on the internal testing set.
At completion of data analysis (2024).

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Presence of distant metastasis at initial diagnosis
Time Frame: At diagnosis (Data from SEER registries covering years 2004-2022).
The primary outcome is the occurrence of distant metastasis (coded as AJCC M1 stage) at the time of initial hepatocellular carcinoma (HCC) diagnosis, as recorded in the Surveillance, Epidemiology, and End Results (SEER) database.
At diagnosis (Data from SEER registries covering years 2004-2022).

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start

January 1, 2004

Primary Completion (Actual)

January 1, 2023

Study Completion (Actual)

December 1, 2025

Study Registration Dates

First Submitted

January 23, 2026

First Submitted That Met QC Criteria

January 29, 2026

First Posted (Actual)

February 4, 2026

Study Record Updates

Last Update Posted (Actual)

February 4, 2026

Last Update Submitted That Met QC Criteria

January 29, 2026

Last Verified

January 1, 2026

More Information

Terms related to this study

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

NO

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Neoplasm Metastasis

Subscribe