Cross-sectional Functional Stratification Based on Psychometric Profiling and Machine Learning in Patients With Substance Use Disorders (SUD) (SISAP-TUS)

May 14, 2026 updated by: Lauro Gutiérrez Castro

Unsupervised Deep Representation Learning for Clinical Stratification in Substance Use Disorders

Substance use disorders (SUDs) show considerable clinical heterogeneity that limits the usefulness of traditional categorical diagnoses. This observational, cross-sectional study aims to apply an unsupervised deep learning method - an autoencoder - to learn continuous latent representations from standardised psychometric data and to explore whether those representations can help stratify clinical subpopulations. The investigators will recruit 155 adults undergoing residential treatment for SUD. Participants will complete six validated instruments assessing impulsivity (BIS-11), anger regulation (STAXI-2), behavioural activation/avoidance (BADS), borderline symptomatology (BSL-23), generalised anxiety (GAD-7), and environmental reward (EROS). Demographic and clinical variables (age, sex, primary substance, years of use, prior treatments) will also be recorded.

After data cleaning and standardisation (z-scores), a symmetric autoencoder with a 12-dimensional bottleneck (architecture 21-32-24-12-24-32-21) will be trained using mean squared error loss. Regularisation includes L2 weight decay and dropout. The model will be trained 30 times with different random seeds to assess stability; the five best models (by validation pseudo-R²) will be combined into a weighted ensemble. Five-fold cross-validation will evaluate generalisation. For comparison, principal component analysis (PCA) will be applied to the same data. Gaussian mixture models (GMM) will be fitted on the latent space to explore potential clinical subgroups.

The primary outcome is the stability of the latent representation (coefficient of variation of validation MSE across runs). Secondary outcomes include reconstruction performance (pseudo-R²) of the ensemble, comparison with PCA, and the interpretability of latent dimensions via correlations with original variables. GMM results will be described using BIC, silhouette width, bootstrap stability, and clinical characterisation of clusters.

This study does not involve any intervention. Results will be hypothesis-generating and require external validation. No automated clinical decisions will be made.

Study Overview

Status

Completed

Conditions

Intervention / Treatment

Other: No intervention (observational only)

Detailed Description

Substance use disorders (SUDs) are characterised by substantial heterogeneity in clinical presentation, behavioural patterns, emotional regulation difficulties, impulsivity, and treatment response. Individuals with the same categorical diagnosis may differ considerably in symptom severity, comorbid psychopathology, and psychosocial functioning. This variability limits the explanatory value of traditional diagnostic classifications and supports the development of dimensional and data-driven approaches for patient characterisation.

Recent advances in machine learning provide methods capable of identifying latent structures within complex clinical datasets. Autoencoders, a form of unsupervised deep learning, can learn compact nonlinear representations of multidimensional data while preserving relevant information from the original variables. Compared with traditional linear dimensionality reduction methods such as principal component analysis (PCA), autoencoders may better capture complex interactions among psychological and behavioural variables. When combined with probabilistic clustering approaches such as Gaussian mixture models (GMM), these latent representations may facilitate the identification of clinically meaningful patient subgroups.

The purpose of this observational study is to apply an autoencoder model to psychometric and clinical data obtained from adults receiving residential treatment for substance use disorders. The study aims to explore latent dimensions underlying symptom and behavioural variability and to evaluate whether these dimensions support stable subgroup identification.

Primary Objective:

To learn a 12-dimensional latent representation from standardised psychometric and clinical variables using an autoencoder model and evaluate the stability of this representation across repeated training procedures.

Secondary Objectives:

To compare the reconstruction performance of the autoencoder with principal component analysis (PCA).

To characterise the clinical meaning of the latent dimensions through correlations with the original variables.

To explore potential patient subgroups using Gaussian mixture models (GMM) applied to the latent space.

To assess the stability and interpretability of the identified subgroups.

Study Design:

This is a single-centre, observational, cross-sectional, non-interventional study conducted in a residential addiction treatment facility. Recruitment is planned from February 2024 through December 2025. The study is registered prior to dissemination of results.

Study Population:

Approximately 155 adults diagnosed with substance use disorder according to DSM-5 criteria will be included. Eligible participants must be 18 years of age or older, currently receiving residential treatment, capable of completing study questionnaires, and willing to provide written informed consent.

Participants with active psychotic disorders, severe cognitive impairment, significant language or literacy barriers, or imminent discharge from treatment will be excluded.

Measures and Data Collection:

Participants will complete a battery of validated self-report instruments assessing impulsivity, anger regulation, behavioural activation and avoidance, borderline symptomatology, anxiety, and environmental reward. Additional demographic and clinical variables will include age, sex, primary substance of use, years of substance use, and prior treatment history.

Questionnaires include:

Barratt Impulsiveness Scale (BIS-11) State-Trait Anger Expression Inventory-2 (STAXI-2) Behavioral Activation for Depression Scale (BADS) Borderline Symptom List-23 (BSL-23) Generalized Anxiety Disorder-7 (GAD-7) Environmental Reward Observation Scale (EROS)

Data Analysis:

Clinical variables will be standardised prior to analysis. Missing values are expected to be minimal and will be handled using median imputation procedures. Redundant variables with excessive multicollinearity may be removed before modelling.

An autoencoder neural network will be trained to generate a reduced latent representation of the clinical data. Model performance and stability will be evaluated across repeated training runs and cross-validation procedures. Reconstruction accuracy will be compared with PCA using equivalent dimensionality.

The resulting latent space will subsequently be analysed using Gaussian mixture models to explore potential patient subgroups. Model selection will consider statistical fit, cluster stability, and clinical interpretability. Correlations between latent dimensions and original clinical variables will be examined to facilitate interpretation of the learned representations.

Ethical Considerations:

The study protocol has been approved by the corresponding Institutional Ethics Committee. All participants will provide written informed consent prior to participation. Data will be anonymised after collection, and no direct identifiers will be retained.

This study is observational and will not modify routine clinical treatment. No automated clinical decisions will be made based on model outputs. Participants may experience mild emotional discomfort or fatigue while completing questionnaires; psychological support will be available if needed.

The study will be conducted in accordance with the Declaration of Helsinki and applicable local ethical regulations.

Dissemination:

Results will be submitted for publication in peer-reviewed scientific journals and presented at academic conferences. De-identified data and analysis code may be shared publicly after publication to support transparency and reproducibility.

Study Type

Observational

Enrollment (Actual)

155

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

Mexico
- Jalisco
  - Ajijic, Jalisco, Mexico, 45920
    - Under The Tree

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

Adult

Accepts Healthy Volunteers

Sampling Method

Non-Probability Sample

Study Population

Adult patients (≥18 years) with a diagnosis of Substance Use Disorder (SUD) admitted to a residential detoxification and rehabilitation center. Consecutive recruitment between February 2024 and March 2026. Estimated final sample size is 155 participants. No healthy volunteers are included.

Description

Inclusion Criteria:

DSM-5 diagnosis of Substance Use Disorder (SUD), confirmed by a psychiatrist or clinical psychologist.
Age ≥ 18 years.
Currently admitted to a residential addiction treatment center at the time of assessment.
Ability to complete the psychometric questionnaires independently.
Written informed consent.

Exclusion Criteria:

Active psychotic disorder (e.g., schizophrenia, delusional disorder) not stabilized pharmacologically.
Severe cognitive impairment (dementia, severe brain injury) that prevents understanding the questionnaire items.
Language barriers or illiteracy that prevent self-administration of the scales.
Scheduled discharge from the center within 7 days of the assessment date.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Number of groups / cohorts

Cohorts and Interventions

Group / Cohort	Intervention / Treatment
Total sample (residential treatment) Adult patients (N=155) with DSM-5 TR substance use disorder receiving residential treatment. All participants completed six psychometric scales (BIS-11, STAXI-2, BADS, BSL-23, GAD-7, EROS) and provided demographic/clinical data in a single cross-sectional session. No intervention was administered.	Other: No intervention (observational only) This is a purely observational study. No drug, device, behavioral therapy, or other intervention was assigned. The study only involved standardized psychometric measurements.

What is the study measuring?

Primary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Latent dimension scores Time Frame: Baseline (single assessment, cross-sectional)	Twelve continuous latent dimensions derived from the bottleneck layer of a symmetric autoencoder trained on 21 standardized clinical variables. Each dimension represents a compressed, nonlinear combination of the original psychometric indicators (impulsivity, emotion regulation, behavioral activation, borderline symptoms, anxiety, and environmental reward). The dimensions are extracted for each participant after averaging the predictions of an ensemble of the five best autoencoder runs. Unit of Measure: Standardized z-score (mean = 0, SD = 1 in the training sample)	Baseline (single assessment, cross-sectional)

Secondary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Gaussian mixture model cluster membership Time Frame: Baseline	Categorical assignment of each participant to one of the clusters obtained by fitting a Gaussian mixture model with full covariance matrices to the 12-dimensional latent space. The number of clusters is determined by the Bayesian Information Criterion (BIC) and clinical interpretability. This outcome is exploratory and does not imply discrete subtypes. Unit of Measure: Nominal (cluster number: 1, 2, …)	Baseline
Autoencoder reconstruction pseudo-R² Time Frame: Baseline (computed on the validation split and on the full sample after training)	Proportion of variance in the original 21 clinical variables that is explained by the autoencoder's reconstructions, defined as 1 - (MSE_model / MSE_null), where MSE_null is the mean squared error of a model predicting only the mean. This metric is calculated for the ensemble of the five best models and for each of the 30 independent runs separately. Unit of Measure: Proportion (range 0 to 1)	Baseline (computed on the validation split and on the full sample after training)
Autoencoder reconstruction mean squared error Time Frame: Baseline	Average squared difference between the original 21 standardized input variables and the reconstructed outputs produced by the autoencoder. Lower values indicate better reconstruction. Reported for the ensemble model and for each independent run. Unit of Measure: Mean squared error (dimensionless, as data are z-standardized)	Baseline
Coefficient of variation of reconstruction MSE Time Frame: Baseline (after all runs are completed)	Coefficient of variation (CV = standard deviation / mean) of the reconstruction MSE computed over 30 independent autoencoder training runs with different random seeds. This metric assesses the stability and reproducibility of the model. Unit of Measure: Percentage (%)	Baseline (after all runs are completed)
Cross-validated reconstruction R² Time Frame: Baseline	Mean R² (and standard deviation) obtained from 5-fold cross-validation repeated 3 times, using the same autoencoder architecture and hyperparameters. This evaluates how well the model generalises to unseen patients. Unit of Measure: Proportion (range 0 to 1)	Baseline
Explained variance by 12 principal components Time Frame: Baseline	Total proportion of variance explained by the first 12 principal components obtained from PCA applied to the same 21 standardized variables. This serves as a comparator for the autoencoder's reconstruction performance. Unit of Measure: Proportion (range 0 to 1)	Baseline

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Sponsor

Lauro Gutiérrez Castro

Investigators

Principal Investigator: Lauro Gutiérrez Castro, Under The Tree

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

March 25, 2024

Primary Completion (Actual)

February 18, 2026

Study Completion (Actual)

April 22, 2026

Study Registration Dates

First Submitted

May 9, 2026

First Submitted That Met QC Criteria

May 9, 2026

First Posted (Actual)

May 15, 2026

Study Record Updates

Last Update Posted (Actual)

May 18, 2026

Last Update Submitted That Met QC Criteria

May 14, 2026

Last Verified

May 1, 2026

More Information

Terms related to this study

Keywords

Additional Relevant MeSH Terms

Other Study ID Numbers

SISAP-TUS-EFTP-ML-2026

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

YES

IPD Plan Description

Individual participant data (IPD) that underlie the results reported in the manuscript will be shared after de-identification (anonymization). The data will include the 21 standardized clinical variables and the 12-dimensional latent representations for all 155 participants. Study protocol, statistical analysis plan, and R code will also be made available.

IPD Sharing Time Frame

Beginning 9 months and ending 36 months after article publication

IPD Sharing Access Criteria

Data will be available to researchers who provide a methodologically sound proposal for purposes of replicating the results or conducting secondary analyses. Proposals should be directed to the corresponding author. Requestors will need to sign a data access agreement.

IPD Sharing Supporting Information Type

STUDY_PROTOCOL
SAP
ICF
CSR

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Addiction

Gazi University

Completed

Motivational Interviewing and Addiction (MI-IA-DGA)

Internet Addiction | Game Addiction, Video

Turkey
Artvin Coruh University

Recruiting

Digital Parenting Education: Impact on Mothers' Social Media Use and Children's Tech Attitudes

Parenting | Social Media Addiction | Internet Addiction

Turkey
Zagazig University

Active, not recruiting

Prevalence of Behavioral Addiction and Its Relation With Psychological Disturbance

Internet Addiction | Gaming Disorder | Porn Addiction

Egypt
Mersin University
Çankırı Karatekin University; Scientific and Technological Research Council...

Completed

Mindfulness-Based Technology Addiction Program (MBTAP)

Internet Addiction | Smartphone Addiction | Technology Addiction

Turkey (Türkiye)
Chengdu Sport University

Not yet recruiting

Cessation on Internet Addiction in College Students (IAD)

Internet Addiction Disorder
The Hong Kong Polytechnic University

Recruiting

The Use of Nursing-students-led bCBTMI

Internet Addiction

Hong Kong
Cumhuriyet University

Completed

Effect of Solution-Focused Approach on the Internet,Social Media Addiction and Self-Efficacy Levels OfNursing Students

Risk Behavior | Social Media Addiction | Internet Addiction
Kafrelsheikh University

Recruiting

Comparison the Effect of Smart Phones Addiction Between Children and Adolescents on Posture Deviations , Physical Fitness and Intelligence in Egypt

Smart Phone Addiction

Egypt
Karabuk University

Completed

Smartphone Addiction in University Students

Smartphone Addiction

Turkey (Türkiye)
Kutahya Health Sciences University

Completed

Effect of Problematic Smartphone Use

Smartphone Addiction

Turkey (Türkiye)

Clinical Trials on No intervention (observational only)

University Hospital, Basel, Switzerland

Completed

Validation Study of ClassIntra® (ClassIntra®)

Postoperative Complications | Intraoperative Complications | Patient Safety | Risk Management

New Zealand, Switzerland, United States, Netherlands, Spain, Austria, Turkey, United Kingdom, Australia, Greece, Ireland, Italy
Soochow University

Recruiting

Heart Evaluation of Acute Ischemic Stroke With Reperfusion Therapy (HEART)

Acute Ischemic Stroke

China
The First Affiliated Hospital of Zhengzhou University
Beijing Tiantan Hospital; National Health and Family Planning Commission, P...

Unknown

China Registry of Non-disabling Ischemic Cerebrovascular Events (CR-NICE)

Transient Ischemic Attack | Minor Stroke

China
Xuanwu Hospital, Beijing

Recruiting

Coexistence of Cerebral and Coronary Atherosclerosis in Acute Ischemic Cerebrovascular Disease Patients Registry (CoCCA)

Coexistence of Cerebral and Coronary Atherosclerosis | Acute Ischemic Cerebrovascular Disease

China
University Medicine Greifswald

Recruiting

Follow-up of GANI_MED Cardio Cohorts (GANIFU-Card)

Metabolic Syndrome | Chronic Heart Failure

Germany
Xiang Xie
The Third Affiliated Hospital of Jinzhou Medical University; Affiliated Traditional...

Completed

Prognosis of Treated Acute Coronary Syndrome Patients: a Multicenter Study (MPCS-ACS)

Acute Coronary Syndrome

China
Abant Izzet Baysal University

Completed

Diaphragmatic Thickness and Pulmonary Function in Parkinson's Disease

Parkinson's Disease and Parkinsonism

Turkey
University of Virginia

Recruiting

Voice Outcomes Following Thyroidectomy

Thyroid Cancer | Thyroid Nodule

United States
Johannes Gutenberg University Mainz

Recruiting

Mainz Intracoronary Database. The Coronary Slow-flow and Microvascular Diseases Registry (MICAT)

Coronary Artery Disease | Microvascular Dysfunction

Germany
Azienda Ospedaliera Universitaria Policlinico "G...

Completed

Long Term Predictors of Acute Myocardial Infarction in Patients With Coronary Revascularization

Acute Myocardial Infarction | Revascularization

Italy

Cross-sectional Functional Stratification Based on Psychometric Profiling and Machine Learning in Patients With Substance Use Disorders (SUD) (SISAP-TUS)

Unsupervised Deep Representation Learning for Clinical Stratification in Substance Use Disorders

Study Overview

Status

Conditions

Intervention / Treatment

Detailed Description

Study Type

Enrollment (Actual)

Contacts and Locations

Study Locations

Participation Criteria

Eligibility Criteria

Ages Eligible for Study

Accepts Healthy Volunteers

Sampling Method

Study Population

Description

Study Plan

How is the study designed?

Design Details

Number of groups / cohorts

Cohorts and Interventions

Group / Cohort

Intervention / Treatment

What is the study measuring?

Primary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Secondary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Collaborators and Investigators

Sponsor

Investigators

Study record dates

Study Major Dates

Study Start (Actual)

Primary Completion (Actual)

Study Completion (Actual)

Study Registration Dates

First Submitted

First Submitted That Met QC Criteria

First Posted (Actual)

Study Record Updates

Last Update Posted (Actual)

Last Update Submitted That Met QC Criteria

Last Verified

More Information

Terms related to this study

Keywords

Additional Relevant MeSH Terms

Other Study ID Numbers

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

IPD Plan Description

IPD Sharing Time Frame

IPD Sharing Access Criteria

IPD Sharing Supporting Information Type

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

Clinical Trials on Addiction

Clinical Trials on No intervention (observational only)

Search Similar Trials

Sponsors and Collaborators

Medical Conditions

Drug Interventions

CROs by country

CROs in Czech Republic

Conditions

Rare Diseases

Drug Interventions

Dietary Supplements

Sponsor/Collaborators

Locations