Data-driven Clustering in Hemorrhoid Surgery: Retrospective Monocentric Study for the Identification of Clinical Phenotypes (PROCTO-CLUSTER)

February 16, 2026 updated by: IRCCS Policlinico S. Donato
This retrospective, single-center observational study will use routinely collected perioperative data from adults undergoing surgery for symptomatic hemorrhoidal disease to identify data-driven clinical phenotypes. Unsupervised machine learning will be applied to characterize clusters of patients based on demographic, clinical, anatomical, and surgical variables. The study will explore whether the resulting phenotypes differ in operative complexity and postoperative course, and will generate hypotheses to inform future predictive models and personalized surgical planning.

Study Overview

Status

Active, not recruiting

Detailed Description

Hemorrhoidal disease presents with heterogeneous symptom patterns, anatomical findings, and operative strategies that are not fully captured by traditional degree-based classifications. This study aims to identify latent, clinically interpretable phenotypes among surgical patients using a fully unsupervised machine learning pipeline applied to routinely collected perioperative data from a high-volume tertiary referral center.

This is a retrospective, observational analysis of de-identified institutional records. The analytic dataset will include routinely documented variables spanning baseline demographics/anthropometrics, symptom profile and relevant clinical history, operative technique and intraoperative descriptors, and routinely captured postoperative follow-up information. Data will be extracted using a predefined data dictionary and standardized preprocessing rules to support reproducibility and reduce variability in variable definitions.

The primary analytic approach will be unsupervised clustering. Variables will be cleaned and standardized prior to modeling. Dimensionality reduction will be performed using t-distributed stochastic neighbor embedding (t-SNE), initialized with principal component analysis to improve stability. Cluster discovery will then be conducted using k-means clustering on the reduced feature space. A range of cluster solutions will be explored, and the final solution will be selected using internal validity metrics (e.g., silhouette-based measures) together with assessment of clinical interpretability. Model robustness will be evaluated through repeated runs across multiple random seeds and key parameter settings to assess stability of cluster assignments.

After cluster assignment, clusters will be characterized using descriptive and comparative statistics to identify variables that most differentiate phenotypes. Post-hoc feature relevance/importance approaches will be used to explore which demographic, clinical, and surgical factors most strongly contribute to cluster formation, with emphasis on effect sizes and clinically meaningful patterns rather than hypothesis-testing alone. Findings will be used to generate hypotheses regarding phenotypes that may be associated with greater operative complexity and different postoperative trajectories, supporting future work on predictive modeling and individualized surgical decision support.

All analyses will be conducted within a controlled institutional environment using validated statistical and data-mining software, with documented parameter settings and version tracking to enable reproducibility. Only de-identified data will be used for analysis, and results will be reported in aggregate to protect patient privacy.

Study Type

Observational

Enrollment (Estimated)

100

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

    • Milan
      • San Donato Milanese, Milan, Italy, 20097
        • IRCCS Policlinico San Donato

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult
  • Older Adult

Accepts Healthy Volunteers

No

Sampling Method

Non-Probability Sample

Study Population

Patients undergoing surgery for symptomatic hemorrhoidal disease

Description

Inclusion criteria

  • Age ≥ 18 years
  • Clinical and/or intraoperative diagnosis of symptomatic hemorrhoidal disease
  • Availability of complete perioperative data: demographic, clinical, surgical, and postoperative variables Exclusion criteria
  • Incomplete or missing clinical data
  • Presence of anorectal neoplastic conditions (e.g., anal or rectal carcinoma)
  • Anorectal surgery within the previous 6 months (to avoid confounding effects on symptoms and anatomy)

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Cohorts and Interventions

Group / Cohort
Intervention / Treatment
patients who underwent surgery for symptomatic hemorrhoidal disease

Consecutive patients who underwent surgery for symptomatic hemorrhoidal disease at IRCCS Policlinico San Donato between December 2024 and June 2025. Consecutive enrollment was chosen to minimize selection bias and to represent the full spectrum of disease severity in the surgical setting.

Inclusion criteria Age ≥ 18 years Clinical and/or intraoperative diagnosis of symptomatic hemorrhoidal disease Availability of complete perioperative data: demographic, clinical, surgical, and postoperative variables Exclusion criteria Incomplete or missing clinical data Presence of anorectal neoplastic conditions (e.g., anal or rectal carcinoma) Anorectal surgery within the previous 6 months (to avoid confounding effects on symptoms and anatomy)

standard hemorrhoidectomy, advanced hemorrhoidectomy, prolapsectomy, Doppler-guided procedures, or combined techniques

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Internal validity of the unsupervised clustering solution (silhouette coefficient)
Time Frame: From completion of dataset extraction/cleaning through completion of clustering analysis (retrospective analysis of surgeries performed December 2024 to June 2025)
Silhouette coefficient of the final k-means clustering solution derived from t-SNE-reduced perioperative data. The silhouette coefficient will be used as the primary internal validity metric to quantify cluster cohesion and separation for the selected number of clusters.
From completion of dataset extraction/cleaning through completion of clustering analysis (retrospective analysis of surgeries performed December 2024 to June 2025)

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Cluster stability and reproducibility across model runs
Time Frame: From completion of dataset extraction/cleaning through completion of clustering robustness analyses (retrospective analysis of surgeries performed December 2024 to June 2025)
Stability of cluster assignments across multiple random seeds and t-SNE parameter settings (including perplexity), summarized by reproducibility/consistency of membership and stability of internal validity metrics across runs.
From completion of dataset extraction/cleaning through completion of clustering robustness analyses (retrospective analysis of surgeries performed December 2024 to June 2025)
Operative duration (proxy of operative complexity)
Time Frame: Intraoperative (day of surgery)
Operative duration (minutes) recorded in the operative report/perioperative database; compared across identified phenotypes.
Intraoperative (day of surgery)
Postoperative pain intensity
Time Frame: From surgery to 6 month postoperatively
Pain intensity as documented in routine postoperative records/follow-up notes (e.g., numeric rating scale when available or clinician-documented pain status), analyzed as pain trajectory/pattern across early and intermediate follow-up and compared across clusters.
From surgery to 6 month postoperatively
Postoperative complications (Clavien-Dindo classification)
Time Frame: From surgery to 1 month postoperatively (early complications) and up to 6 months postoperatively (late complications)
Any postoperative complication recorded in routine follow-up, graded according to the Clavien-Dindo system; complication rates and severity compared across clusters.
From surgery to 1 month postoperatively (early complications) and up to 6 months postoperatively (late complications)
Time to return to routine activities
Time Frame: From surgery to 1 month postoperatively
Time to return to routine activities/work when documented in follow-up notes; compared across phenotypes.
From surgery to 1 month postoperatively
Recurrence
Time Frame: 1 month and 6 months postoperatively
Recurrence patterns or persistence/return of hemorrhoid-related symptoms (e.g., bleeding/prolapse/other symptoms) as documented in routine follow-up and need for re-evaluation or additional intervention; compared across clusters.
1 month and 6 months postoperatively

Other Outcome Measures

Outcome Measure
Measure Description
Time Frame
Cluster separation metrics (beyond silhouette)
Time Frame: From completion of dataset extraction/cleaning through completion of clustering analysis (retrospective analysis of surgeries performed December 2024 to June 2025)
Additional internal cluster separation metrics (e.g., measures of between-cluster separation/within-cluster dispersion as implemented in the analytic workflow) reported to support interpretability of the phenotype solution.
From completion of dataset extraction/cleaning through completion of clustering analysis (retrospective analysis of surgeries performed December 2024 to June 2025)
Between-cluster differences in clinical/anatomical/surgical characteristics
Time Frame: Baseline (preoperative assessment) and intraoperative (day of surgery)
Differences across clusters in routinely collected demographic and clinical history variables (e.g., age, sex, BMI, comorbidity burden, medications, symptom profile, bowel habit characteristics), anatomical descriptors (when documented), and procedure type/technique selection.
Baseline (preoperative assessment) and intraoperative (day of surgery)
Post-hoc feature relevance for cluster formation
Time Frame: From completion of dataset extraction/cleaning through completion of post-hoc feature relevance analyses (retrospective analysis of surgeries performed December 2024 to June 2025)
Relative contribution/importance of demographic, clinical, and surgical variables to cluster formation assessed using post-hoc feature relevance approaches; used to interpret drivers of phenotype structure.
From completion of dataset extraction/cleaning through completion of post-hoc feature relevance analyses (retrospective analysis of surgeries performed December 2024 to June 2025)

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Publications and helpful links

The person responsible for entering information about the study voluntarily provides these publications. These may be about anything related to the study.

General Publications

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

December 1, 2024

Primary Completion (Actual)

December 31, 2025

Study Completion (Estimated)

April 1, 2026

Study Registration Dates

First Submitted

February 5, 2026

First Submitted That Met QC Criteria

February 16, 2026

First Posted (Actual)

February 23, 2026

Study Record Updates

Last Update Posted (Actual)

February 23, 2026

Last Update Submitted That Met QC Criteria

February 16, 2026

Last Verified

January 1, 2026

More Information

Terms related to this study

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

NO

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Hemorrhoid Prolapse

Clinical Trials on Any surgical procedure for hemorrhoidal disease

Subscribe