Predicting Pathologic Complete Response to Neoadjuvant Chemotherapy in Breast Cancer Using Machine Learning Models.

February 15, 2026 updated by: Enver Özkurt, Florence Nightingale Hospital, Istanbul

Clinicopathology-based Machine Learning Model for Prediction of Pathologic Complete Response to Neoadjuvant Chemotherapy in Breast Cancer

This retrospective observational study aims to develop and validate a clinicopathology-based machine learning model to predict pathological complete response (pCR) following neoadjuvant chemotherapy in patients with breast cancer. Clinical and pathological data collected between 2010 and 2025 were used to train and evaluate multiple machine learning algorithms using cross-validation and independent holdout testing. The primary outcome was pathological complete response after neoadjuvant chemotherapy. Model performance was assessed using discrimination and classification metrics, including ROC-AUC, precision-recall AUC, F1-score, and Matthews correlation coefficient. The resulting model is intended to support clinical decision-making by providing individualized probability estimates of treatment response.

Study Overview

Status

Completed

Conditions

Detailed Description

This retrospective observational study was conducted using a breast cancer registry containing clinical and pathological data from patients who received neoadjuvant chemotherapy between January 2010 and December 2025. The objective of the study was to develop and validate a machine learning-based predictive model for pathological complete response (pCR) using routinely available clinicopathological variables.

An initial dataset consisting of 298 patients and 144 recorded variables was curated by breast oncology experts to identify clinically relevant predictors. A total of 20 established clinicopathological variables were selected, representing demographic characteristics, tumor staging, biomarker profiles, and treatment-related factors. Feature engineering techniques, including ordinal encoding, one-hot encoding, and binary mapping, were applied to prepare the dataset for model development. Missing values were handled using median imputation within a cross-validation pipeline to prevent data leakage.

Feature selection was performed using a hybrid importance framework integrating mutual information analysis, SHAP-based attribution from gradient boosting models, and L1-regularized logistic regression coefficients. Sequential feature subset evaluation identified an optimal subset of 10 predictors for model development.

Multiple machine learning algorithms-including logistic regression, random forest, gradient boosting models, support vector machines, k-nearest neighbors, and ensemble learning approaches-were trained and evaluated using 5-fold stratified cross-validation. Final performance was assessed on independent validation and holdout datasets using ROC-AUC, precision-recall AUC, F1-score, and Matthews correlation coefficient.

The primary outcome was pathological complete response following neoadjuvant chemotherapy. Threshold optimization was performed to identify a clinically meaningful probability cutoff that balanced sensitivity and specificity for predicting treatment response. Model performance was compared against a prevalence-adjusted stochastic baseline using Monte Carlo simulation to confirm predictive validity beyond chance.

This study evaluates the feasibility of applying clinicopathology-based machine learning models to predict treatment response in breast cancer and to support individualized clinical decision-making in the neoadjuvant setting.

Study Type

Observational

Enrollment (Actual)

298

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult
  • Older Adult

Accepts Healthy Volunteers

No

Sampling Method

Non-Probability Sample

Study Population

The study population consists of patients with breast cancer treated with neoadjuvant chemotherapy at a single tertiary breast care center between January 2010 and December 2025. The registry includes demographic, tumor staging, biomarker, and treatment-related clinicopathological data collected as part of routine clinical care. All patients underwent surgery following neoadjuvant chemotherapy with documented pathological response assessment.

Description

Inclusion Criteria:

  • Histologically confirmed breast cancer
  • Receipt of neoadjuvant chemotherapy
  • Available clinicopathological data required for model development
  • Surgical treatment performed following neoadjuvant chemotherapy
  • Pathological response assessment available
  • Recorded pathological details

Exclusion Criteria:

  • Missing pathological response information
  • Incomplete clinicopathological data required for model analysis
  • Patients not treated with neoadjuvant chemotherapy
  • Non-invasive breast cancer without indication for neoadjuvant treatment

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Cohorts and Interventions

Group / Cohort
Patients with no residual invasive cancer in surgical pathology following neoadjuvant chemotherapy

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Pathological Complete Response (pCR)
Time Frame: At time of surgery following completion of neoadjuvant chemotherapy (approximately 4-6 months after treatment initiation)
Pathological complete response is defined as the absence of residual invasive cancer in the breast and axillary lymph nodes at the time of surgery following completion of neoadjuvant chemotherapy.
At time of surgery following completion of neoadjuvant chemotherapy (approximately 4-6 months after treatment initiation)

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Investigators

  • Principal Investigator: Enver Özkurt, Assoc. Prof., Demiroğlu Bilim University, Faculty of Medicine

Publications and helpful links

The person responsible for entering information about the study voluntarily provides these publications. These may be about anything related to the study.

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

January 1, 2010

Primary Completion (Actual)

December 31, 2025

Study Completion (Actual)

December 31, 2025

Study Registration Dates

First Submitted

February 15, 2026

First Submitted That Met QC Criteria

February 15, 2026

First Posted (Actual)

February 23, 2026

Study Record Updates

Last Update Posted (Actual)

February 23, 2026

Last Update Submitted That Met QC Criteria

February 15, 2026

Last Verified

February 1, 2026

More Information

Terms related to this study

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

NO

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Breast Cancer

Subscribe