Machine Learning and Pregnancy Success Prediction in Fertility Treatments (MaLIV-PMA)

October 2, 2025 updated by: Enrico Papaleo, IRCCS San Raffaele

Machine Learning-based Evaluation of Pregnancy Success Indicators in Assisted Reproductive Technology (ART) Cycles

Infertility, as defined by the World Health Organization (WHO), is a disorder of the male or female reproductive system characterized by the inability to achieve a clinical pregnancy after 12 months or more of regular, unprotected sexual intercourse. In modern fertility treatment, assisted reproductive technologies (ART), including in vitro fertilization (IVF), have become a standard approach for addressing complex fertility issues and sterility. In Italy, infertility affects approximately 16.5% of couples.

Despite advancements in ART, comparing the failure rates of pregnancies achieved through ART with those of spontaneous pregnancies in Italy reveals significant differences, particularly in terms of success rates, miscarriage rates, and embryo implantation outcomes.

In this context, AI-based models have shown promising potential in predicting IVF success by analyzing complex datasets that include patient demographics, hormonal levels, and embryo morphology. Research indicates that AI can enhance embryo selection, predict the optimal timing for embryo transfer, and advance personalized medicine approaches in reproductive health.

This study aims to use of Machine Learning to identify patterns and factors associated with successful pregnancy outcomes by analyzing large-scale, anonymized ART data. The resulting predictive model could enable clinicians to better personalize treatment protocols for each patient, optimizing medication dosages, timing, and embryo selection. It could also improve pregnancy success rates while reducing the emotional and financial burden on patients, thus advancing the standard of care in ART.

Study Overview

Status

Active, not recruiting

Detailed Description

This is a multicentric, observational, retrospective, non-profit study, coordinated by the IRCCS San Raffaele Hospital, aims to analyze anonymized data collected between 2019 and 2024 from approximately 5,000 couples undergoing Assisted Reproductive Technology (ART) procedures across three participating centers. The study will examine key variables, including age, medical history, treatment protocols, ART techniques (such as In Vitro Fertilization [IVF] and Intracytoplasmic Sperm Injection [ICSI]), embryo quality, and pregnancy outcomes, to develop a machine learning-based predictive model for pregnancy outcomes. The selected timeframe ensures a sufficiently large dataset to facilitate robust development and validation of the predictive model.

By leveraging machine learning techniques, this study aims to enhance the accuracy of pregnancy outcome predictions, thereby improving patient counseling and treatment planning in ART procedures. The comprehensive dataset, encompassing a diverse range of variables and a substantial number of cases, will provide a robust foundation for developing a predictive model with high clinical applicability.

The primary objective of this study is to develop a Machine Learning-based predictive model for pregnancy outcomes in assisted reproductive technologies (ART), by analyzing large-scale, anonymized data, for scientific research purposes. The model aims to identify key patterns and factors that correlate with successful pregnancy outcomes to optimize individualized treatment protocols for patients undergoing ART.

SAMPLE SIZE:

The sample size will be approximately 5,000 pairs of subjects (women + men) based on the total number of ART cycles recorded at the participating centers during this period and the number of patients with complete data records that provide sufficient information for analysis. We expect approximately 1650 pairs for the class "success" and 3350 for the class "unsuccess" of the IVF treatment. Thus, the Machine Learning-based predictive model could be trained using a multi-parametric approach with a balanced set of 1350 pairs of subjects, using the remaining couples of subjects to test the performance of the model.

The minimum sample size for the retrospective study should be 295 pairs of subjects, calculated to yield a 95% confidence interval of ± 2.5% around an expected sensitivity of 94% and an expected specificity of 15% of the prediction model, with a prevalence of IVF treatment success of 30% and a dropout rate of 2%. This success prevalence is expected based on the clinical site's experience of the number of IVF treatments; the dropout rate is considered low, at 2%, considering the type of retrospective clinical study using software and the residual possibility of complete data not being valid. Sensitivity, specificity, positive and negative predictive values are calculated with their 95% confidence intervals.

Considering the minimum sample size and the available number of samples, we expect to achieve statistical significance from the hypothesis testing and to obtain a multi-modal signature of predictors of IVF success.

STATISTICAL DESIGN A structured methodology will be employed to develop and assess machine learning models for binary classification tasks, specifically distinguishing between "Success" and "Not Success." The process will commence with the selection of informative and non-redundant features, eliminating those with low variance and high correlation. Subsequently, three distinct classifier models-Random Forest, Support Vector Machines (SVM), and K-Nearest Neighbors (KNN)-will be trained and evaluated using k-fold cross-validation to ensure robust performance assessment. To address potential data imbalances, the ADASYN technique will be applied, generating synthetic samples for the minority class. Model performance will be quantified using various metrics, including accuracy, sensitivity, specificity, and the area under the receiver operating characteristic curve (ROC-AUC), to identify the most effective model. Finally, a statistical analysis of the most pertinent features will be conducted using non-parametric tests and corrections for multiple comparisons, aiming to elucidate class differences and ensure result reliability.

This structured approach will ensure that the models are meticulously tuned and validated through rigorous testing and analysis, leading to accurate and reliable machine learning models for binary classification tasks.

INFORMED CONSENT AND DATA PROTECTION In accordance with data protection regulations, the study will utilize anonymized data previously collected through routine clinical practice and stored in MedITEX IVF, a management software used at the participating assisted reproduction centers. No direct patient interaction or intervention will occur as part of the study. All data will be anonymized following best practice guidelines to ensure patient confidentiality, adhering to ethical standards and applicable data privacy regulations.

The Investigator (or the Center receiving the data) commits to processing the data solely for the purposes of the study, storing it in a secure network system, and restricting access to authorized personnel who have undertaken confidentiality agreements. If external suppliers are involved, they will be appointed as Data Processors with appropriate agreements in place. The Investigator will also facilitate the exercise of data subject rights, including access, rectification, cancellation, limitation, opposition, and portability, within 30 days of receiving the relevant request. In the event of data communication outside the institution in pseudonymized form, efforts will be made to prevent the identification of data subjects. Within 30 days following the end of the study, the Investigator will ensure the deletion or irreversible anonymization of the communicated data and promptly communicate this in writing.

A study-specific Data Protection Impact Assessment (DPIA), reviewed by the Data Protection Officer (DPO) of the coordinating institution, has been conducted in accordance with applicable data protection laws.

Study Type

Observational

Enrollment (Estimated)

5000

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

    • Milano
      • Milan, Milano, Italy, 20132
        • IRCCS San Raffaele Hospital

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult

Accepts Healthy Volunteers

No

Sampling Method

Non-Probability Sample

Study Population

Couples who received IVF treatment between 2019 and 2024

Description

Inclusion Criteria:

  • Patients who underwent ART procedures, including IVF and ICSI, between 2019 and 2024.
  • Women aged between 18 and 43 years.

Exclusion Criteria:

  • Patiens with incomplete or missing data records that do not provide sufficient information for analysis.
  • women outside the 18 to 43 age range

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Cohorts and Interventions

Group / Cohort
IVF patients

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Pregnancy rate
Time Frame: Data will be extracted for all ART cycles conducted between 2019 and 2024 to allow for the comprehensive development of the Machine Learning-based model.
The primary endpoint of the study will be the clinical pregnancy defined as a pregnancy confirmed by an increasing level of hCG and the presence of a gestational sac or heartbeat detected by ultrasound.
Data will be extracted for all ART cycles conducted between 2019 and 2024 to allow for the comprehensive development of the Machine Learning-based model.

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Publications and helpful links

The person responsible for entering information about the study voluntarily provides these publications. These may be about anything related to the study.

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

April 16, 2025

Primary Completion (Estimated)

March 1, 2026

Study Completion (Estimated)

March 1, 2026

Study Registration Dates

First Submitted

March 13, 2025

First Submitted That Met QC Criteria

March 18, 2025

First Posted (Actual)

March 19, 2025

Study Record Updates

Last Update Posted (Estimated)

October 7, 2025

Last Update Submitted That Met QC Criteria

October 2, 2025

Last Verified

October 1, 2025

More Information

Terms related to this study

Additional Relevant MeSH Terms

Other Study ID Numbers

  • MaLIV-PMA

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

NO

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Infertility (IVF Patients)

Subscribe