A Privacy-Preserving OCR-LLM System for Coronary Syndrome Subtyping From Admission HPI: Multicenter Validation in China and the US (OCR-LLM-CHD)

February 27, 2026 updated by: China National Center for Cardiovascular Diseases

Development and Multicenter Validation of a Privacy-Preserving OCR-LLM Pipeline for Four-Subtype Coronary Syndrome Classification Using Admission HPI Across Heterogeneous EHR Systems

This study develops and validates a privacy-preserving OCR-LLM pipeline that converts admission history of present illness (HPI) records into structured coronary syndrome subtypes (STEMI, NSTEMI, unstable angina, and chronic coronary syndrome). The system first extracts text from de-identified HPI images using locally deployed OCR, then applies large language models with a fixed diagnostic prompt to generate subtype classification and evidence. Performance is evaluated in an internal validation cohort and multiple external datasets covering heterogeneous EHR templates, emergency department cases, and an English dataset from MIMIC-IV. A clinician usability study assesses changes in diagnostic accuracy and time with and without tool assistance.

Study Overview

Status

Not yet recruiting

Conditions

Intervention / Treatment

Study Type

Observational

Enrollment (Estimated)

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Contact

Name: Xiaojin Gao, Dr
Phone Number: +86010 88322415
Email: sophie_gao@sina.com

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

Adult
Older Adult

Accepts Healthy Volunteers

N/A

Sampling Method

Probability Sample

Study Population

The study population includes (1) de-identified clinical encounter records and (2) physician participants for usability testing. For record-based cohorts, eligible encounters contain an admission/presentation History of Present Illness (HPI) (text or image-derived text) sufficient for 4-class coronary syndrome subtyping (e.g., STEMI, NSTEMI, unstable angina, chronic coronary syndrome). Records are analyzed at the encounter level and are organized into five dataset-based cohorts (internal validation, multicenter external validation, ED external validation, English EHR external validation, and clinician usability). Reference labels are assigned using a prespecified clinical adjudication process.

Description

Inclusion Criteria:Hospital encounters with admission HPI documenting sym

2a4afaef-9dc1-47fc-874f-9dffaf7…

evant to coronary syndrome subtyping.

Cases with sufficient documentation to assign one of four target subtypes (STEMI, NSTEMI, UA, CCS) by adjudication.

Exclusion Criteria: Unclear subtype or incomplete/uncertain time information preventing gold standard assignment.

Non-CHD primary reason for admission after screening (for MIMIC-IV cohort).

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Number of groups / cohorts

Cohorts and Interventions

Group / Cohort	Intervention / Treatment
Internal Development and Validation Cohort Retrospective cohort used for model development and internal validation. Inputs are de-identified admission HPI records (image or text) from the AIM-CHD dataset. Expert adjudication provides the reference standard labels for 4-class coronary syndrome subtyping (STEMI, NSTEMI, unstable angina, chronic coronary syndrome).	Device: OCR-Prompt-LLM Information Extraction and Classification Workflow (OCR-Prompt-LLM) An automated clinical data management workflow integrating Optical Character Recognition (OCR), optimized prompt engineering, and large language models (LLMs). The system processes unstructured inpatient/ED records (primarily admission history of present illness and related narrative text) to extract prespecified key clinical indicators (e.g., left ventricular ejection fraction, coronary syndrome subtype, medications) and to classify cases into prespecified coronary artery disease categories (e.g., unstable angina, STEMI, NSTEMI, chronic coronary syndrome). The workflow outputs structured fields and a classification result with supporting evidence excerpts. Device: Manual Clinical Data Review Standard manual process in which experienced clinicians review patient medical records and extract the same prespecified clinical indicators and coronary artery disease categories using routine clinical judgment and documentation review. This manual abstraction serves as the human benchmark for comparing diagnostic accuracy, completeness, and operational efficiency against the automated OCR-Prompt-LLM workflow.
Multicenter External Validation Cohort Retrospective multicenter cohort used for external validation across heterogeneous EHR templates and documentation styles. De-identified admission HPI records are processed through the same OCR-LLM pipeline, and predictions are compared with expert adjudicated reference labels to assess generalizability.	Device: OCR-Prompt-LLM Information Extraction and Classification Workflow (OCR-Prompt-LLM) An automated clinical data management workflow integrating Optical Character Recognition (OCR), optimized prompt engineering, and large language models (LLMs). The system processes unstructured inpatient/ED records (primarily admission history of present illness and related narrative text) to extract prespecified key clinical indicators (e.g., left ventricular ejection fraction, coronary syndrome subtype, medications) and to classify cases into prespecified coronary artery disease categories (e.g., unstable angina, STEMI, NSTEMI, chronic coronary syndrome). The workflow outputs structured fields and a classification result with supporting evidence excerpts. Device: Manual Clinical Data Review Standard manual process in which experienced clinicians review patient medical records and extract the same prespecified clinical indicators and coronary artery disease categories using routine clinical judgment and documentation review. This manual abstraction serves as the human benchmark for comparing diagnostic accuracy, completeness, and operational efficiency against the automated OCR-Prompt-LLM workflow.
Emergency Department External Validation Cohort Retrospective cohort representing real-world emergency department workflow. De-identified ED admission HPI records are used to evaluate model performance under time-sensitive, information-limited conditions and assess robustness to ED documentation variability.	Device: OCR-Prompt-LLM Information Extraction and Classification Workflow (OCR-Prompt-LLM) An automated clinical data management workflow integrating Optical Character Recognition (OCR), optimized prompt engineering, and large language models (LLMs). The system processes unstructured inpatient/ED records (primarily admission history of present illness and related narrative text) to extract prespecified key clinical indicators (e.g., left ventricular ejection fraction, coronary syndrome subtype, medications) and to classify cases into prespecified coronary artery disease categories (e.g., unstable angina, STEMI, NSTEMI, chronic coronary syndrome). The workflow outputs structured fields and a classification result with supporting evidence excerpts. Device: Manual Clinical Data Review Standard manual process in which experienced clinicians review patient medical records and extract the same prespecified clinical indicators and coronary artery disease categories using routine clinical judgment and documentation review. This manual abstraction serves as the human benchmark for comparing diagnostic accuracy, completeness, and operational efficiency against the automated OCR-Prompt-LLM workflow.
English EHR External Validation Cohort Retrospective cohort derived from the public de-identified MIMIC-IV database. English admission notes/HPI text are used to evaluate cross-language transportability and performance of the same classification prompts and post-processing rules against reference labels derived by adjudication/structured diagnosis mapping (as prespecified in the protocol).	Device: OCR-Prompt-LLM Information Extraction and Classification Workflow (OCR-Prompt-LLM) An automated clinical data management workflow integrating Optical Character Recognition (OCR), optimized prompt engineering, and large language models (LLMs). The system processes unstructured inpatient/ED records (primarily admission history of present illness and related narrative text) to extract prespecified key clinical indicators (e.g., left ventricular ejection fraction, coronary syndrome subtype, medications) and to classify cases into prespecified coronary artery disease categories (e.g., unstable angina, STEMI, NSTEMI, chronic coronary syndrome). The workflow outputs structured fields and a classification result with supporting evidence excerpts. Device: Manual Clinical Data Review Standard manual process in which experienced clinicians review patient medical records and extract the same prespecified clinical indicators and coronary artery disease categories using routine clinical judgment and documentation review. This manual abstraction serves as the human benchmark for comparing diagnostic accuracy, completeness, and operational efficiency against the automated OCR-Prompt-LLM workflow.
Clinician Usability Cohort Prospective usability evaluation cohort. Physicians complete a structured coronary syndrome subtyping task using admission HPI cases. Outcomes include diagnostic accuracy and time to completion; within-participant comparisons may be performed between unassisted and tool-assisted conditions as prespecified.	Device: OCR-Prompt-LLM Information Extraction and Classification Workflow (OCR-Prompt-LLM) An automated clinical data management workflow integrating Optical Character Recognition (OCR), optimized prompt engineering, and large language models (LLMs). The system processes unstructured inpatient/ED records (primarily admission history of present illness and related narrative text) to extract prespecified key clinical indicators (e.g., left ventricular ejection fraction, coronary syndrome subtype, medications) and to classify cases into prespecified coronary artery disease categories (e.g., unstable angina, STEMI, NSTEMI, chronic coronary syndrome). The workflow outputs structured fields and a classification result with supporting evidence excerpts. Device: Manual Clinical Data Review Standard manual process in which experienced clinicians review patient medical records and extract the same prespecified clinical indicators and coronary artery disease categories using routine clinical judgment and documentation review. This manual abstraction serves as the human benchmark for comparing diagnostic accuracy, completeness, and operational efficiency against the automated OCR-Prompt-LLM workflow.

What is the study measuring?

Primary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Overall classification accuracy Time Frame: 1 month	Time Frame: Up to completion of dataset evaluation (internal + external cohorts) Description: Proportion of cases with correct subtype (STEMI/NSTEMI/UA/CCS) compared with expert-adjudicated gold standard.	1 month

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Sponsor

China National Center for Cardiovascular Diseases

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Estimated)

February 28, 2026

Primary Completion (Estimated)

March 8, 2026

Study Completion (Estimated)

March 8, 2026

Study Registration Dates

First Submitted

February 27, 2026

First Submitted That Met QC Criteria

February 27, 2026

First Posted (Actual)

March 4, 2026

Study Record Updates

Last Update Posted (Actual)

March 4, 2026

Last Update Submitted That Met QC Criteria

February 27, 2026

Last Verified

February 1, 2026

More Information

Terms related to this study

Additional Relevant MeSH Terms

Other Study ID Numbers

CAD-LLM-002
Sponsor (Other Grant/Funding Number: Sponsor)

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

IPD Plan Description

To protect patient privacy and comply with the data management policies of the participating institutions (Fuwai Hospital and sub-centers), individual participant data will not be made publicly available. However, aggregated study results and statistical analyses will be included in the final publication.

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Acute Coronary Syndromes

Heart Care Foundation
Novartis Farma S.p.A.

Not yet recruiting

EARLY: Educational Intervention to Improve Patient Awareness on Early LDL-C Lowering in Secondary Prevention (EARLY)

Acute Coronary Syndromes | Chronic Coronary Syndromes

Italy
Assistance Publique - Hôpitaux de Paris
Ministry of Health, France

Not yet recruiting

Evaluation of the Effectiveness of an Interdisciplinary Intervention After Acute Coronary Syndrome on Low-Density Lipoprotein Cholesterol Levels (EDUSCA)

Acute Coronary Syndromes | Acute Coronary Syndromes (ACS)

France
HeartBeam, Inc.

Active, not recruiting

ALIGN-ACS Pilot Study (ALIGN-ACS)

Acute Coronary Syndromes

Serbia
Tongji Hospital

Recruiting

Influenza Vaccination After Acute Coronary Syndrome (InVaACS)

Acute Coronary Syndromes (ACS)

China
China National Center for Cardiovascular Diseases

Not yet recruiting

Sirolimus-Coated vs Paclitaxel-Coated DCBs in ACS Treatment

Acute Coronary Syndromes (ACS)

China
SUK MIN SEO
Boston Scientific Korea Co. Ltd

Recruiting

IVUS-Guided vs Angiography-Guided PCI in Acute Coronary Syndrome (SAINT-IVUS)

Acute Coronary Syndromes (ACS)

South Korea
Institute of medicine, Maharagjung medical campus

Completed

Adherence to Secondary Prevention Drugs and Influencing Factors After Acute Coronary Syndrome in Patients at a Tertiary Center in Nepal

Adherence | Acute Coronary Syndromes (ACS)

Nepal
Heart Care Foundation
Daiichi Sankyo Europe, GmbH, a Daiichi Sankyo Company

Not yet recruiting

triPle Oral thERapy With Bempedoic Acid vs uSual Care in Early Lipid Management of Patients With acUte coronAry synDromE (PERSUADE)

Acute Coronary Syndromes | Secondary Prevention | Lipids

Italy
Ceric Sàrl
European Cardiovascular Research Center; Philips Medical Systems

Recruiting

Coronary Laser Atherectomy Registry From an International Taskforce (CLARIT)

Stable Coronary Artery Disease | Acute Coronary Syndromes

Spain
Shenyang Northern Hospital

Not yet recruiting

Bivalirudin Versus Heparin During PCI in High Bleeding Risk Patients With Acute Coronary Syndromes (BRIGHT-HBR)

Percutaneous Coronary Intervention | Acute Coronary Syndromes | High Bleeding Risk | Anticoagulant Therapy

Clinical Trials on OCR-Prompt-LLM Information Extraction and Classification Workflow (OCR-Prompt-LLM)

China National Center for Cardiovascular Diseases

Completed

Performance of an OCR-Prompt-LLM Integrated Workflow for Extracting Multi-dimensional Clinical Data in Ischemic Heart Disease (OPAL-CAD)

Coronary Artery Disease | Data Collection | Artificial Intelligence (AI)

China

A Privacy-Preserving OCR-LLM System for Coronary Syndrome Subtyping From Admission HPI: Multicenter Validation in China and the US (OCR-LLM-CHD)

Development and Multicenter Validation of a Privacy-Preserving OCR-LLM Pipeline for Four-Subtype Coronary Syndrome Classification Using Admission HPI Across Heterogeneous EHR Systems

Study Overview

Status

Conditions

Intervention / Treatment

Study Type

Enrollment (Estimated)

Contacts and Locations

Study Contact

Participation Criteria

Eligibility Criteria

Ages Eligible for Study

Accepts Healthy Volunteers

Sampling Method

Study Population

Description

Study Plan

How is the study designed?

Design Details

Number of groups / cohorts

Cohorts and Interventions

Group / Cohort

Intervention / Treatment

What is the study measuring?

Primary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Collaborators and Investigators

Sponsor

Study record dates

Study Major Dates

Study Start (Estimated)

Primary Completion (Estimated)

Study Completion (Estimated)

Study Registration Dates

First Submitted

First Submitted That Met QC Criteria

First Posted (Actual)

Study Record Updates

Last Update Posted (Actual)

Last Update Submitted That Met QC Criteria

Last Verified

More Information

Terms related to this study

Additional Relevant MeSH Terms

Other Study ID Numbers

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

IPD Plan Description

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

Clinical Trials on Acute Coronary Syndromes

Clinical Trials on OCR-Prompt-LLM Information Extraction and Classification Workflow (OCR-Prompt-LLM)

Search Similar Trials

Sponsors and Collaborators

Medical Conditions

Drug Interventions

CROs by country

CROs in Portugal

Conditions

Rare Diseases

Drug Interventions

Dietary Supplements

Sponsor/Collaborators

Locations