Performance of an OCR-Prompt-LLM Integrated Workflow for Extracting Multi-dimensional Clinical Data in Ischemic Heart Disease (OPAL-CAD)

This research aims to evaluate a comprehensive AI-driven workflow for both clinical data extraction and diagnostic classification in coronary artery disease (CAD). Leveraging OCR and Large Language Models (LLMs), the system is designed to extract ten key clinical parameters (such as LVEF and lab results) and provide diagnostic subtypes (UA, STEMI, NSTEMI, CCS) directly from unstructured inpatient records. A man-machine comparative trial will be conducted using a test set of 308 patients, where the performance of the LLM-based workflow will be benchmarked against the average diagnostic accuracy and processing time of seven clinical physicians. The findings will provide evidence for the feasibility of using LLMs to enhance clinical data structuring and diagnostic efficiency in cardiology.

Study Overview

Study Type

Observational

Enrollment (Actual)

308

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

      • Beijing, China, 100037
        • Fuwai Hospital

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult
  • Older Adult

Accepts Healthy Volunteers

Yes

Sampling Method

Probability Sample

Study Population

he study population consists of 308 patients diagnosed with various subtypes of coronary artery disease (CAD). The cohort is derived from two major clinical studies: the AIM-CHD study (for pilot testing and prompt optimization) and the SMART-CHD study (for internal validation), both conducted at Fuwai Hospital. Additionally, an external validation cohort is included, comprising patients from 8 independent clinical sub-centers across China to ensure geographical and institutional diversity. The population covers a spectrum of CAD presentations, including Unstable Angina (UA), STEMI, NSTEMI, and Chronic Coronary Syndrome (CCS), providing a robust dataset for evaluating AI-driven diagnostic and data extraction performance.

Description

Inclusion Criteria:

  1. Patients aged 18 years and older.
  2. Clinical records of patients who were previously enrolled in the AIM-CHD (for the pilot/prompt optimization set) or SMART-CHD (for the internal validation cohort) studies.
  3. Patients diagnosed with, or suspected of having, coronary artery disease (CAD), including subtypes: Unstable Angina (UA), STEMI, NSTEMI, and Chronic Coronary Syndrome (CCS).

Exclusion Criteria:

  1. Clinical records with severe data fragmentation or missing more than 50% of the key clinical indicators.
  2. Handwritten medical records or low-quality scans that are illegible for Optical Character Recognition (OCR) processing.
  3. Duplicate records or records with conflicting "Gold Standard" labels that cannot be reconciled by the expert committee.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Cohorts and Interventions

Group / Cohort
Intervention / Treatment
Test Cohort
This group consists of 50 patient records from the AIM-CHD Study at Fuwai Hospital. These data are specifically utilized for refining OCR processing and optimizing Prompt Engineering for the LLM-based workflow.
The intervention is an automated clinical data management system integrating Optical Character Recognition (OCR), optimized Prompt Engineering, and Large Language Models (LLMs). The workflow processes unstructured inpatient records to extract 10 key clinical indicators (e.g., LVEF, CAD subtypes, medications) and classifies the patient into specific coronary artery disease categories (UA, STEMI, NSTEMI, CCS)
Standard manual process where experienced clinical physicians collect and interpret patient information from medical records. This serves as the human benchmark for comparing diagnostic accuracy and operational efficiency.
Internal Validation Cohort
This cohort includes 188 clinical cases sourced from the SMART-CHD Study at Fuwai Hospital. These records serve as the primary internal benchmark to evaluate the diagnostic and extraction accuracy of the LLM workflow against the established ground truth.
The intervention is an automated clinical data management system integrating Optical Character Recognition (OCR), optimized Prompt Engineering, and Large Language Models (LLMs). The workflow processes unstructured inpatient records to extract 10 key clinical indicators (e.g., LVEF, CAD subtypes, medications) and classifies the patient into specific coronary artery disease categories (UA, STEMI, NSTEMI, CCS)
Standard manual process where experienced clinical physicians collect and interpret patient information from medical records. This serves as the human benchmark for comparing diagnostic accuracy and operational efficiency.
External Validation Cohort
This cohort comprises 70 patient records collected from 8 independent sub-centers (excluding Fuwai Hospital) to assess the generalizability and robustness of the model across diverse clinical environments and different medical record formats.
The intervention is an automated clinical data management system integrating Optical Character Recognition (OCR), optimized Prompt Engineering, and Large Language Models (LLMs). The workflow processes unstructured inpatient records to extract 10 key clinical indicators (e.g., LVEF, CAD subtypes, medications) and classifies the patient into specific coronary artery disease categories (UA, STEMI, NSTEMI, CCS)
Standard manual process where experienced clinical physicians collect and interpret patient information from medical records. This serves as the human benchmark for comparing diagnostic accuracy and operational efficiency.

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Overall Diagnostic and Extraction Accuracy Rate
Time Frame: Through study completion, an average of 3 months.
To calculate the overall accuracy rate of the LLM-based workflow across 308 cases (including the pilot set, internal validation cohort, and external validation cohort) for 10 clinical indicators (e.g., LVEF, blood glucose, etc.) and 4 diagnostic subtypes of coronary artery disease. Accuracy is defined as the proportion of cases where the LLM's extraction or diagnostic results are perfectly consistent with the 'Gold Standard' established by human clinical experts.
Through study completion, an average of 3 months.

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

February 23, 2026

Primary Completion (Actual)

March 1, 2026

Study Completion (Actual)

March 2, 2026

Study Registration Dates

First Submitted

March 24, 2026

First Submitted That Met QC Criteria

March 24, 2026

First Posted (Actual)

March 30, 2026

Study Record Updates

Last Update Posted (Actual)

March 30, 2026

Last Update Submitted That Met QC Criteria

March 24, 2026

Last Verified

February 1, 2026

More Information

Terms related to this study

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

NO

IPD Plan Description

To protect patient privacy and comply with the data management policies of the participating institutions (Fuwai Hospital and sub-centers), individual participant data will not be made publicly available. However, aggregated study results and statistical analyses will be included in the final publication.

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Coronary Artery Disease

Clinical Trials on OCR-Prompt-LLM Information Extraction Workflow

Subscribe