- ICH GCP
- US Clinical Trials Registry
- Clinical Trial NCT07590154
Cross-sectional Functional Stratification Based on Psychometric Profiling and Machine Learning in Patients With Substance Use Disorders (SUD) (SISAP-TUS)
Unsupervised Deep Representation Learning for Clinical Stratification in Substance Use Disorders
Substance use disorders (SUDs) show considerable clinical heterogeneity that limits the usefulness of traditional categorical diagnoses. This observational, cross-sectional study aims to apply an unsupervised deep learning method - an autoencoder - to learn continuous latent representations from standardised psychometric data and to explore whether those representations can help stratify clinical subpopulations. The investigators will recruit 155 adults undergoing residential treatment for SUD. Participants will complete six validated instruments assessing impulsivity (BIS-11), anger regulation (STAXI-2), behavioural activation/avoidance (BADS), borderline symptomatology (BSL-23), generalised anxiety (GAD-7), and environmental reward (EROS). Demographic and clinical variables (age, sex, primary substance, years of use, prior treatments) will also be recorded.
After data cleaning and standardisation (z-scores), a symmetric autoencoder with a 12-dimensional bottleneck (architecture 21-32-24-12-24-32-21) will be trained using mean squared error loss. Regularisation includes L2 weight decay and dropout. The model will be trained 30 times with different random seeds to assess stability; the five best models (by validation pseudo-R²) will be combined into a weighted ensemble. Five-fold cross-validation will evaluate generalisation. For comparison, principal component analysis (PCA) will be applied to the same data. Gaussian mixture models (GMM) will be fitted on the latent space to explore potential clinical subgroups.
The primary outcome is the stability of the latent representation (coefficient of variation of validation MSE across runs). Secondary outcomes include reconstruction performance (pseudo-R²) of the ensemble, comparison with PCA, and the interpretability of latent dimensions via correlations with original variables. GMM results will be described using BIC, silhouette width, bootstrap stability, and clinical characterisation of clusters.
This study does not involve any intervention. Results will be hypothesis-generating and require external validation. No automated clinical decisions will be made.
Study Overview
Status
Conditions
Intervention / Treatment
Detailed Description
Substance use disorders (SUDs) are characterised by substantial heterogeneity in clinical presentation, behavioural patterns, emotional regulation difficulties, impulsivity, and treatment response. Individuals with the same categorical diagnosis may differ considerably in symptom severity, comorbid psychopathology, and psychosocial functioning. This variability limits the explanatory value of traditional diagnostic classifications and supports the development of dimensional and data-driven approaches for patient characterisation.
Recent advances in machine learning provide methods capable of identifying latent structures within complex clinical datasets. Autoencoders, a form of unsupervised deep learning, can learn compact nonlinear representations of multidimensional data while preserving relevant information from the original variables. Compared with traditional linear dimensionality reduction methods such as principal component analysis (PCA), autoencoders may better capture complex interactions among psychological and behavioural variables. When combined with probabilistic clustering approaches such as Gaussian mixture models (GMM), these latent representations may facilitate the identification of clinically meaningful patient subgroups.
The purpose of this observational study is to apply an autoencoder model to psychometric and clinical data obtained from adults receiving residential treatment for substance use disorders. The study aims to explore latent dimensions underlying symptom and behavioural variability and to evaluate whether these dimensions support stable subgroup identification.
Primary Objective:
To learn a 12-dimensional latent representation from standardised psychometric and clinical variables using an autoencoder model and evaluate the stability of this representation across repeated training procedures.
Secondary Objectives:
To compare the reconstruction performance of the autoencoder with principal component analysis (PCA).
To characterise the clinical meaning of the latent dimensions through correlations with the original variables.
To explore potential patient subgroups using Gaussian mixture models (GMM) applied to the latent space.
To assess the stability and interpretability of the identified subgroups.
Study Design:
This is a single-centre, observational, cross-sectional, non-interventional study conducted in a residential addiction treatment facility. Recruitment is planned from February 2024 through December 2025. The study is registered prior to dissemination of results.
Study Population:
Approximately 155 adults diagnosed with substance use disorder according to DSM-5 criteria will be included. Eligible participants must be 18 years of age or older, currently receiving residential treatment, capable of completing study questionnaires, and willing to provide written informed consent.
Participants with active psychotic disorders, severe cognitive impairment, significant language or literacy barriers, or imminent discharge from treatment will be excluded.
Measures and Data Collection:
Participants will complete a battery of validated self-report instruments assessing impulsivity, anger regulation, behavioural activation and avoidance, borderline symptomatology, anxiety, and environmental reward. Additional demographic and clinical variables will include age, sex, primary substance of use, years of substance use, and prior treatment history.
Questionnaires include:
Barratt Impulsiveness Scale (BIS-11) State-Trait Anger Expression Inventory-2 (STAXI-2) Behavioral Activation for Depression Scale (BADS) Borderline Symptom List-23 (BSL-23) Generalized Anxiety Disorder-7 (GAD-7) Environmental Reward Observation Scale (EROS)
Data Analysis:
Clinical variables will be standardised prior to analysis. Missing values are expected to be minimal and will be handled using median imputation procedures. Redundant variables with excessive multicollinearity may be removed before modelling.
An autoencoder neural network will be trained to generate a reduced latent representation of the clinical data. Model performance and stability will be evaluated across repeated training runs and cross-validation procedures. Reconstruction accuracy will be compared with PCA using equivalent dimensionality.
The resulting latent space will subsequently be analysed using Gaussian mixture models to explore potential patient subgroups. Model selection will consider statistical fit, cluster stability, and clinical interpretability. Correlations between latent dimensions and original clinical variables will be examined to facilitate interpretation of the learned representations.
Ethical Considerations:
The study protocol has been approved by the corresponding Institutional Ethics Committee. All participants will provide written informed consent prior to participation. Data will be anonymised after collection, and no direct identifiers will be retained.
This study is observational and will not modify routine clinical treatment. No automated clinical decisions will be made based on model outputs. Participants may experience mild emotional discomfort or fatigue while completing questionnaires; psychological support will be available if needed.
The study will be conducted in accordance with the Declaration of Helsinki and applicable local ethical regulations.
Dissemination:
Results will be submitted for publication in peer-reviewed scientific journals and presented at academic conferences. De-identified data and analysis code may be shared publicly after publication to support transparency and reproducibility.
Study Type
Enrollment (Actual)
Contacts and Locations
Study Locations
-
-
Jalisco
-
Ajijic, Jalisco, Mexico, 45920
- Under The Tree
-
-
Participation Criteria
Eligibility Criteria
Ages Eligible for Study
- Adult
Accepts Healthy Volunteers
Sampling Method
Study Population
Description
Inclusion Criteria:
- DSM-5 diagnosis of Substance Use Disorder (SUD), confirmed by a psychiatrist or clinical psychologist.
- Age ≥ 18 years.
- Currently admitted to a residential addiction treatment center at the time of assessment.
- Ability to complete the psychometric questionnaires independently.
- Written informed consent.
Exclusion Criteria:
- Active psychotic disorder (e.g., schizophrenia, delusional disorder) not stabilized pharmacologically.
- Severe cognitive impairment (dementia, severe brain injury) that prevents understanding the questionnaire items.
- Language barriers or illiteracy that prevent self-administration of the scales.
- Scheduled discharge from the center within 7 days of the assessment date.
Study Plan
How is the study designed?
Design Details
Cohorts and Interventions
Group / Cohort |
Intervention / Treatment |
|---|---|
|
Total sample (residential treatment)
Adult patients (N=155) with DSM-5 TR substance use disorder receiving residential treatment.
All participants completed six psychometric scales (BIS-11, STAXI-2, BADS, BSL-23, GAD-7, EROS) and provided demographic/clinical data in a single cross-sectional session.
No intervention was administered.
|
This is a purely observational study.
No drug, device, behavioral therapy, or other intervention was assigned.
The study only involved standardized psychometric measurements.
|
What is the study measuring?
Primary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Latent dimension scores
Time Frame: Baseline (single assessment, cross-sectional)
|
Twelve continuous latent dimensions derived from the bottleneck layer of a symmetric autoencoder trained on 21 standardized clinical variables. Each dimension represents a compressed, nonlinear combination of the original psychometric indicators (impulsivity, emotion regulation, behavioral activation, borderline symptoms, anxiety, and environmental reward). The dimensions are extracted for each participant after averaging the predictions of an ensemble of the five best autoencoder runs. Unit of Measure: Standardized z-score (mean = 0, SD = 1 in the training sample) |
Baseline (single assessment, cross-sectional)
|
Secondary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Gaussian mixture model cluster membership
Time Frame: Baseline
|
Categorical assignment of each participant to one of the clusters obtained by fitting a Gaussian mixture model with full covariance matrices to the 12-dimensional latent space. The number of clusters is determined by the Bayesian Information Criterion (BIC) and clinical interpretability. This outcome is exploratory and does not imply discrete subtypes. Unit of Measure: Nominal (cluster number: 1, 2, …) |
Baseline
|
|
Autoencoder reconstruction pseudo-R²
Time Frame: Baseline (computed on the validation split and on the full sample after training)
|
Proportion of variance in the original 21 clinical variables that is explained by the autoencoder's reconstructions, defined as 1 - (MSE_model / MSE_null), where MSE_null is the mean squared error of a model predicting only the mean. This metric is calculated for the ensemble of the five best models and for each of the 30 independent runs separately. Unit of Measure: Proportion (range 0 to 1) |
Baseline (computed on the validation split and on the full sample after training)
|
|
Autoencoder reconstruction mean squared error
Time Frame: Baseline
|
Average squared difference between the original 21 standardized input variables and the reconstructed outputs produced by the autoencoder. Lower values indicate better reconstruction. Reported for the ensemble model and for each independent run. Unit of Measure: Mean squared error (dimensionless, as data are z-standardized) |
Baseline
|
|
Coefficient of variation of reconstruction MSE
Time Frame: Baseline (after all runs are completed)
|
Coefficient of variation (CV = standard deviation / mean) of the reconstruction MSE computed over 30 independent autoencoder training runs with different random seeds. This metric assesses the stability and reproducibility of the model. Unit of Measure: Percentage (%) |
Baseline (after all runs are completed)
|
|
Cross-validated reconstruction R²
Time Frame: Baseline
|
Mean R² (and standard deviation) obtained from 5-fold cross-validation repeated 3 times, using the same autoencoder architecture and hyperparameters. This evaluates how well the model generalises to unseen patients. Unit of Measure: Proportion (range 0 to 1) |
Baseline
|
|
Explained variance by 12 principal components
Time Frame: Baseline
|
Total proportion of variance explained by the first 12 principal components obtained from PCA applied to the same 21 standardized variables. This serves as a comparator for the autoencoder's reconstruction performance. Unit of Measure: Proportion (range 0 to 1) |
Baseline
|
Collaborators and Investigators
Sponsor
Investigators
- Principal Investigator: Lauro Gutiérrez Castro, Under The Tree
Study record dates
Study Major Dates
Study Start (Actual)
Primary Completion (Actual)
Study Completion (Actual)
Study Registration Dates
First Submitted
First Submitted That Met QC Criteria
First Posted (Actual)
Study Record Updates
Last Update Posted (Actual)
Last Update Submitted That Met QC Criteria
Last Verified
More Information
Terms related to this study
Additional Relevant MeSH Terms
Other Study ID Numbers
- SISAP-TUS-EFTP-ML-2026
Plan for Individual participant data (IPD)
Plan to Share Individual Participant Data (IPD)?
IPD Plan Description
IPD Sharing Time Frame
IPD Sharing Access Criteria
IPD Sharing Supporting Information Type
- STUDY_PROTOCOL
- SAP
- ICF
- CSR
Drug and device information, study documents
Studies a U.S. FDA-regulated drug product
Studies a U.S. FDA-regulated device product
This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.
Clinical Trials on Addiction
-
Gazi UniversityCompletedInternet Addiction | Game Addiction, VideoTurkey
-
Artvin Coruh UniversityRecruitingParenting | Social Media Addiction | Internet AddictionTurkey
-
Mersin UniversityÇankırı Karatekin University; Scientific and Technological Research Council...CompletedInternet Addiction | Smartphone Addiction | Technology AddictionTurkey (Türkiye)
-
Zagazig UniversityActive, not recruitingInternet Addiction | Gaming Disorder | Porn AddictionEgypt
-
Chengdu Sport UniversityNot yet recruitingInternet Addiction Disorder
-
The Hong Kong Polytechnic UniversityRecruiting
-
Cumhuriyet UniversityCompletedRisk Behavior | Social Media Addiction | Internet Addiction
-
Kafrelsheikh UniversityRecruiting
-
Karabuk UniversityCompleted
-
Kutahya Health Sciences UniversityCompleted
Clinical Trials on No intervention (observational only)
-
University Hospital, Basel, SwitzerlandCompletedPostoperative Complications | Intraoperative Complications | Patient Safety | Risk ManagementNew Zealand, Switzerland, United States, Netherlands, Spain, Austria, Turkey, United Kingdom, Australia, Greece, Ireland, Italy
-
Soochow UniversityRecruitingAcute Ischemic StrokeChina
-
The First Affiliated Hospital of Zhengzhou UniversityBeijing Tiantan Hospital; National Health and Family Planning Commission, P...UnknownTransient Ischemic Attack | Minor StrokeChina
-
Xuanwu Hospital, BeijingRecruitingCoexistence of Cerebral and Coronary Atherosclerosis | Acute Ischemic Cerebrovascular DiseaseChina
-
University Medicine GreifswaldRecruitingMetabolic Syndrome | Chronic Heart FailureGermany
-
Xiang XieThe Third Affiliated Hospital of Jinzhou Medical University; Affiliated Traditional...CompletedAcute Coronary SyndromeChina
-
Abant Izzet Baysal UniversityCompletedParkinson's Disease and ParkinsonismTurkey
-
University of VirginiaRecruiting
-
Johannes Gutenberg University MainzRecruitingCoronary Artery Disease | Microvascular DysfunctionGermany
-
Azienda Ospedaliera Universitaria Policlinico "G...CompletedAcute Myocardial Infarction | RevascularizationItaly