- ICH GCP
- US Clinical Trials Registry
- Clinical Trial NCT07595718
AI-Based Phenome Data Analysis for Predicting the Onset of Major Diseases
This study aims to develop and validate an artificial intelligence (AI)-based predictive model to estimate the risk of incident onset of five major diseases or conditions: cardiovascular disease, type 2 diabetes mellitus, breast cancer, low back pain, and osteoarthritis, in adults aged 30 to 60 years.
For each participant, an index date will be defined as the date of a prior health screening or another protocol-defined baseline clinical date. Incident disease status for each target disease or condition will be ascertained by retrospective review of electronic medical records for up to 10 years after the index date.
The study integrates retrospective clinical, health screening, laboratory, imaging, and electronic medical record data with prospectively collected biospecimen, proteomic, genomic, questionnaire, lifestyle, and digital health data. Prospective study procedures will be completed over approximately 1 week, with up to 2 additional weeks if needed.
By combining multimodal data, this study seeks to improve disease risk prediction and to identify clinical and biological factors associated with disease onset, ultimately supporting personalized risk stratification and preventive healthcare strategies.
Study Overview
Status
Detailed Description
This observational study aims to develop and validate an artificial intelligence (AI)-based predictive model for assessing the risk of incident onset of five major diseases or conditions: cardiovascular disease, type 2 diabetes mellitus, breast cancer, low back pain, and osteoarthritis, in adults aged 30 to 60 years.
The study uses a hybrid retrospective and prospective data collection design. Retrospective clinical, health screening, laboratory, imaging, and electronic medical record data will be combined with prospectively collected biospecimen, proteomic, genomic, questionnaire, lifestyle, and digital health data.
For disease-onset analyses, an index date will be defined for each participant as the date of a prior health screening or another protocol-defined baseline clinical date. For each target disease or condition, participants without that target disease or condition at the index date will be classified as incident cases if a new diagnosis is identified in electronic medical records up to 10 years after the index date. Participants without a diagnosis of that target disease or condition through the available observation period will be classified as persistent controls. Disease occurrence will be ascertained through retrospective electronic medical record review rather than through new prospective long-term follow-up.
A total of approximately 1,000 participants will be enrolled. The disease group will include approximately 880 adults aged 30 to 60 years with a confirmed diagnosis of one or more of the five target diseases or conditions. The healthy control group will include approximately 120 adults aged 30 to 60 years without a prior diagnosis of any of the five target diseases or conditions.
Retrospective data collection will include medical records, health screening results, laboratory results, and imaging-related data. Prospective data collection will include blood samples for proteomic and genomic analyses, questionnaires, lifestyle and behavioral data, and digital health assessments. App-based questionnaires and digital assessments will be performed at home over approximately 7 days. If app-based sleep assessment or other digital assessments are not completed within this period, up to 2 additional weeks may be provided.
These multimodal data will be integrated to create a high-dimensional phenomic and omics dataset for AI model development. Machine learning and deep learning approaches will be applied to predict disease risk for each target disease or condition. Model performance will be evaluated using discrimination, diagnostic performance, and calibration metrics. Reclassification metrics will be evaluated only if a prespecified comparator risk score is available for the relevant target disease or condition.
The study aims to improve prediction of disease onset and to enhance understanding of biological and clinical factors associated with disease risk. The resulting model is expected to support personalized risk stratification and preventive healthcare strategies.
Study Type
Enrollment (Estimated)
Contacts and Locations
Study Contact
- Name: Jaeyong Jeon
- Phone Number: +82-2-3010-3791
- Email: jyjeon71@gmail.com
Study Locations
-
-
Seoul Special City
-
Seoul, Seoul Special City, South Korea, 05505
- Recruiting
- Seoul ASAN Medical Center
-
Contact:
- Jaeyong Jeon
- Phone Number: +82-10-9970-3791
- Email: jyjeon71@gmail.com
-
-
Participation Criteria
Eligibility Criteria
Ages Eligible for Study
- Adult
Accepts Healthy Volunteers
Sampling Method
Study Population
This study will involve two distinct groups of participants: a disease group and a healthy control group.
Disease Group:
The disease group will consist of adults aged 30 to 60 years who have been diagnosed with at least one of the following conditions:
Type 2 diabetes mellitus Breast cancer Cardiovascular disease Osteoarthritis Low back pain These participants will undergo questionnaires, digital assessments, physical examinations, and blood tests as part of the study.
- Healthy Control Group:
The healthy control group will consist of adults aged 30 to 60 years with no prior diagnosis of any of the conditions listed above (type 2 diabetes mellitus, breast cancer, cardiovascular disease, osteoarthritis, or low back pain).
This group will also undergo similar assessments including questionnaires, physical examinations, and blood tests, but they will not have the aforementioned conditions.
Description
Inclusion Criteria:
- Adults aged 30 to 60 years.
- Disease group: Participants with a confirmed diagnosis of at least one of the following conditions: type 2 diabetes mellitus, breast cancer, cardiovascular disease, osteoarthritis, or low back pain.
- Healthy control group: Participants with no prior diagnosis of type 2 diabetes mellitus, breast cancer, cardiovascular disease, osteoarthritis, or low back pain.
- No history or current diagnosis of major medical conditions that may affect study outcomes, including but not limited to chronic kidney disease or liver cirrhosis.
- Ability to understand the study procedures and provision of written informed consent prior to participation.
Exclusion Criteria:
- Participants with incomplete or insufficient clinical or health screening data.
- Participants considered inappropriate for study participation by the investigator.
Study Plan
How is the study designed?
Design Details
Cohorts and Interventions
Group / Cohort |
|---|
|
Disease Group
Adults aged 30 to 60 years with one or more of the five major diseases. Five major diseases are Cardiovascular Diseases, Diabetes Mellitus, Type 2, Breast Neoplasms, Low Back Pain and Osteoarthritis. |
|
Healthy Control Group
Adults aged 30 to 60 years without five major diseases.
|
What is the study measuring?
Primary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Number of Participants With Incident Target Disease or Condition Identified Up to 10 Years After the Index Date
Time Frame: Up to 10 years after the index date
|
Incident target disease or condition will be assessed for five prespecified target diseases or conditions: cardiovascular disease, type 2 diabetes mellitus, breast cancer, low back pain, and osteoarthritis.
For each target disease or condition, incident occurrence will be defined as a new diagnosis recorded in electronic medical records after the index date among participants without that target disease or condition at the index date.
Results will be summarized separately for each target disease or condition as the number and percentage of participants with incident disease or condition.
|
Up to 10 years after the index date
|
Secondary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Discriminative performance of the artificial intelligence model in distinguishing between disease and control groups using baseline data from health screenings and clinical records (AUROC, PR-AUC)
Time Frame: Through study completion, approximately 9 months
|
Discriminative performance will be assessed using the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (PR-AUC).
These metrics will evaluate the ability of the artificial intelligence model to distinguish participants with incident target disease or condition from persistent controls.
Incident disease status will be ascertained by retrospective electronic medical record review up to 10 years after the index date.
|
Through study completion, approximately 9 months
|
|
Diagnostic Performance of the Artificial Intelligence Model for Predicting Incident Target Diseases or Conditions
Time Frame: Through study completion, approximately 9 months
|
Diagnostic performance will be assessed using sensitivity, specificity, positive predictive value, and negative predictive value at a prespecified risk threshold.
These metrics will evaluate the ability of the artificial intelligence model to classify participants with incident target disease or condition and persistent controls.
Incident disease status will be ascertained by retrospective electronic medical record review up to 10 years after the index date.
|
Through study completion, approximately 9 months
|
|
Brier Score of the Artificial Intelligence Model for Predicting Incident Target Diseases or Conditions
Time Frame: Through study completion, approximately 9 months
|
The Brier score will be calculated as the mean squared difference between predicted risk and observed incident disease status for the five target diseases or conditions.
Incident disease status will be ascertained by retrospective electronic medical record review up to 10 years after the index date.
Lower values indicate better prediction accuracy.
|
Through study completion, approximately 9 months
|
|
Calibration Slope of the Artificial Intelligence Model for Predicting Incident Target Diseases or Conditions
Time Frame: Through study completion, approximately 9 months
|
Calibration slope will be estimated by comparing predicted risk with observed incident disease status for the five target diseases or conditions.
Incident disease status will be ascertained by retrospective electronic medical record review up to 10 years after the index date.
A value close to 1 indicates better calibration.
|
Through study completion, approximately 9 months
|
|
Calibration Intercept of the Artificial Intelligence Model for Predicting Incident Target Diseases or Conditions
Time Frame: Through study completion, approximately 9 months
|
Calibration intercept will be estimated by comparing predicted risk with observed incident disease status for the five target diseases or conditions.
Incident disease status will be ascertained by retrospective electronic medical record review up to 10 years after the index date.
A value close to 0 indicates better calibration-in-the-large.
|
Through study completion, approximately 9 months
|
Collaborators and Investigators
Sponsor
Study record dates
Study Major Dates
Study Start (Actual)
Primary Completion (Estimated)
Study Completion (Estimated)
Study Registration Dates
First Submitted
First Submitted That Met QC Criteria
First Posted (Actual)
Study Record Updates
Last Update Posted (Actual)
Last Update Submitted That Met QC Criteria
Last Verified
More Information
Terms related to this study
Additional Relevant MeSH Terms
- Pain
- Neurologic Manifestations
- Endocrine System Diseases
- Musculoskeletal Diseases
- Neoplasms by Site
- Neoplasms
- Arthritis
- Joint Diseases
- Rheumatic Diseases
- Metabolic Diseases
- Glucose Metabolism Disorders
- Diabetes Mellitus
- Skin Diseases
- Breast Diseases
- Back Pain
- Pathological Conditions, Signs and Symptoms
- Nutritional and Metabolic Diseases
- Skin and Connective Tissue Diseases
- Signs and Symptoms
- Diabetes Mellitus, Type 2
- Osteoarthritis
- Cardiovascular Diseases
- Breast Neoplasms
- Low Back Pain
Other Study ID Numbers
- 2026-0205
Plan for Individual participant data (IPD)
Plan to Share Individual Participant Data (IPD)?
Drug and device information, study documents
Studies a U.S. FDA-regulated drug product
Studies a U.S. FDA-regulated device product
This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.
Clinical Trials on Cardiovascular Diseases
-
Weill Medical College of Cornell UniversityAmerican Heart AssociationRecruitingCardiovascular | Cardiovascular Health | Cardiovascular (CV) Risk | Cardiovascular Disease (CVD) Risk FactorsUnited States
-
Hull University Teaching Hospitals NHS TrustNot yet recruitingCardiovascular Surgery | Cardiovascular Diseases (CVD)United Kingdom
-
Fu Jen Catholic UniversityRecruitingCardiovascular Disease | Cardiovascular SurgeryTaiwan
-
Medical College of WisconsinNational Center for Complementary and Integrative Health (NCCIH)CompletedCardiovascular Diseases | Cardiovascular Risk Factor | Cardiovascular HealthUnited States
-
Hospital Mutua de TerrassaCompleted
-
IRCCS Policlinico S. DonatoIRCCS San Raffaele; Fondazione Policlinico Universitario Agostino Gemelli IRCCS and other collaboratorsRecruitingCardiovascular Risk | Genetic Cardiovascular RiskItaly
-
Oregon Health and Science UniversityCompletedCardiovascular Disease | Cardiovascular Risk FactorsUnited States
-
Women's College HospitalUniversity Health Network, Toronto; Sunnybrook Health Sciences Centre; Brigham... and other collaboratorsUnknownCARDIOVASCULAR DISEASESCanada, United States
-
Groupe Hospitalier Paris Saint JosephTerminatedCARDIOVASCULAR DISEASESFrance
-
Children's Hospital Medical Center, CincinnatiRecruitingCardiovascular Diseases (CVD)United States