AI-Based Phenome Data Analysis for Predicting the Onset of Major Diseases

May 12, 2026 updated by: Jae Yong Jeon, MD

This study aims to develop and validate an artificial intelligence (AI)-based predictive model to estimate the risk of incident onset of five major diseases or conditions: cardiovascular disease, type 2 diabetes mellitus, breast cancer, low back pain, and osteoarthritis, in adults aged 30 to 60 years.

For each participant, an index date will be defined as the date of a prior health screening or another protocol-defined baseline clinical date. Incident disease status for each target disease or condition will be ascertained by retrospective review of electronic medical records for up to 10 years after the index date.

The study integrates retrospective clinical, health screening, laboratory, imaging, and electronic medical record data with prospectively collected biospecimen, proteomic, genomic, questionnaire, lifestyle, and digital health data. Prospective study procedures will be completed over approximately 1 week, with up to 2 additional weeks if needed.

By combining multimodal data, this study seeks to improve disease risk prediction and to identify clinical and biological factors associated with disease onset, ultimately supporting personalized risk stratification and preventive healthcare strategies.

Study Overview

Detailed Description

This observational study aims to develop and validate an artificial intelligence (AI)-based predictive model for assessing the risk of incident onset of five major diseases or conditions: cardiovascular disease, type 2 diabetes mellitus, breast cancer, low back pain, and osteoarthritis, in adults aged 30 to 60 years.

The study uses a hybrid retrospective and prospective data collection design. Retrospective clinical, health screening, laboratory, imaging, and electronic medical record data will be combined with prospectively collected biospecimen, proteomic, genomic, questionnaire, lifestyle, and digital health data.

For disease-onset analyses, an index date will be defined for each participant as the date of a prior health screening or another protocol-defined baseline clinical date. For each target disease or condition, participants without that target disease or condition at the index date will be classified as incident cases if a new diagnosis is identified in electronic medical records up to 10 years after the index date. Participants without a diagnosis of that target disease or condition through the available observation period will be classified as persistent controls. Disease occurrence will be ascertained through retrospective electronic medical record review rather than through new prospective long-term follow-up.

A total of approximately 1,000 participants will be enrolled. The disease group will include approximately 880 adults aged 30 to 60 years with a confirmed diagnosis of one or more of the five target diseases or conditions. The healthy control group will include approximately 120 adults aged 30 to 60 years without a prior diagnosis of any of the five target diseases or conditions.

Retrospective data collection will include medical records, health screening results, laboratory results, and imaging-related data. Prospective data collection will include blood samples for proteomic and genomic analyses, questionnaires, lifestyle and behavioral data, and digital health assessments. App-based questionnaires and digital assessments will be performed at home over approximately 7 days. If app-based sleep assessment or other digital assessments are not completed within this period, up to 2 additional weeks may be provided.

These multimodal data will be integrated to create a high-dimensional phenomic and omics dataset for AI model development. Machine learning and deep learning approaches will be applied to predict disease risk for each target disease or condition. Model performance will be evaluated using discrimination, diagnostic performance, and calibration metrics. Reclassification metrics will be evaluated only if a prespecified comparator risk score is available for the relevant target disease or condition.

The study aims to improve prediction of disease onset and to enhance understanding of biological and clinical factors associated with disease risk. The resulting model is expected to support personalized risk stratification and preventive healthcare strategies.

Study Type

Observational

Enrollment (Estimated)

1000

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Contact

Study Locations

    • Seoul Special City
      • Seoul, Seoul Special City, South Korea, 05505
        • Recruiting
        • Seoul ASAN Medical Center
        • Contact:

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult

Accepts Healthy Volunteers

Yes

Sampling Method

Non-Probability Sample

Study Population

This study will involve two distinct groups of participants: a disease group and a healthy control group.

  1. Disease Group:

    The disease group will consist of adults aged 30 to 60 years who have been diagnosed with at least one of the following conditions:

    Type 2 diabetes mellitus Breast cancer Cardiovascular disease Osteoarthritis Low back pain These participants will undergo questionnaires, digital assessments, physical examinations, and blood tests as part of the study.

  2. Healthy Control Group:

The healthy control group will consist of adults aged 30 to 60 years with no prior diagnosis of any of the conditions listed above (type 2 diabetes mellitus, breast cancer, cardiovascular disease, osteoarthritis, or low back pain).

This group will also undergo similar assessments including questionnaires, physical examinations, and blood tests, but they will not have the aforementioned conditions.

Description

Inclusion Criteria:

  1. Adults aged 30 to 60 years.
  2. Disease group: Participants with a confirmed diagnosis of at least one of the following conditions: type 2 diabetes mellitus, breast cancer, cardiovascular disease, osteoarthritis, or low back pain.
  3. Healthy control group: Participants with no prior diagnosis of type 2 diabetes mellitus, breast cancer, cardiovascular disease, osteoarthritis, or low back pain.
  4. No history or current diagnosis of major medical conditions that may affect study outcomes, including but not limited to chronic kidney disease or liver cirrhosis.
  5. Ability to understand the study procedures and provision of written informed consent prior to participation.

Exclusion Criteria:

  1. Participants with incomplete or insufficient clinical or health screening data.
  2. Participants considered inappropriate for study participation by the investigator.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Cohorts and Interventions

Group / Cohort
Disease Group

Adults aged 30 to 60 years with one or more of the five major diseases.

Five major diseases are Cardiovascular Diseases, Diabetes Mellitus, Type 2, Breast Neoplasms, Low Back Pain and Osteoarthritis.

Healthy Control Group
Adults aged 30 to 60 years without five major diseases.

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Number of Participants With Incident Target Disease or Condition Identified Up to 10 Years After the Index Date
Time Frame: Up to 10 years after the index date
Incident target disease or condition will be assessed for five prespecified target diseases or conditions: cardiovascular disease, type 2 diabetes mellitus, breast cancer, low back pain, and osteoarthritis. For each target disease or condition, incident occurrence will be defined as a new diagnosis recorded in electronic medical records after the index date among participants without that target disease or condition at the index date. Results will be summarized separately for each target disease or condition as the number and percentage of participants with incident disease or condition.
Up to 10 years after the index date

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Discriminative performance of the artificial intelligence model in distinguishing between disease and control groups using baseline data from health screenings and clinical records (AUROC, PR-AUC)
Time Frame: Through study completion, approximately 9 months
Discriminative performance will be assessed using the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (PR-AUC). These metrics will evaluate the ability of the artificial intelligence model to distinguish participants with incident target disease or condition from persistent controls. Incident disease status will be ascertained by retrospective electronic medical record review up to 10 years after the index date.
Through study completion, approximately 9 months
Diagnostic Performance of the Artificial Intelligence Model for Predicting Incident Target Diseases or Conditions
Time Frame: Through study completion, approximately 9 months
Diagnostic performance will be assessed using sensitivity, specificity, positive predictive value, and negative predictive value at a prespecified risk threshold. These metrics will evaluate the ability of the artificial intelligence model to classify participants with incident target disease or condition and persistent controls. Incident disease status will be ascertained by retrospective electronic medical record review up to 10 years after the index date.
Through study completion, approximately 9 months
Brier Score of the Artificial Intelligence Model for Predicting Incident Target Diseases or Conditions
Time Frame: Through study completion, approximately 9 months
The Brier score will be calculated as the mean squared difference between predicted risk and observed incident disease status for the five target diseases or conditions. Incident disease status will be ascertained by retrospective electronic medical record review up to 10 years after the index date. Lower values indicate better prediction accuracy.
Through study completion, approximately 9 months
Calibration Slope of the Artificial Intelligence Model for Predicting Incident Target Diseases or Conditions
Time Frame: Through study completion, approximately 9 months
Calibration slope will be estimated by comparing predicted risk with observed incident disease status for the five target diseases or conditions. Incident disease status will be ascertained by retrospective electronic medical record review up to 10 years after the index date. A value close to 1 indicates better calibration.
Through study completion, approximately 9 months
Calibration Intercept of the Artificial Intelligence Model for Predicting Incident Target Diseases or Conditions
Time Frame: Through study completion, approximately 9 months
Calibration intercept will be estimated by comparing predicted risk with observed incident disease status for the five target diseases or conditions. Incident disease status will be ascertained by retrospective electronic medical record review up to 10 years after the index date. A value close to 0 indicates better calibration-in-the-large.
Through study completion, approximately 9 months

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

April 2, 2026

Primary Completion (Estimated)

October 31, 2026

Study Completion (Estimated)

December 31, 2026

Study Registration Dates

First Submitted

April 18, 2026

First Submitted That Met QC Criteria

May 12, 2026

First Posted (Actual)

May 19, 2026

Study Record Updates

Last Update Posted (Actual)

May 19, 2026

Last Update Submitted That Met QC Criteria

May 12, 2026

Last Verified

May 1, 2026

More Information

Terms related to this study

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

NO

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Cardiovascular Diseases

Subscribe