Artificial Intelligence as a Decision Making Tool in Emergency Department

April 16, 2026 updated by: Shahar Shelly MD, Rambam Health Care Campus

Artificial Intelligence as a Decision Making Tool in Emergency Medicine

This study will evaluate the performance of a large language model (LLM)-based clinical decision support system in the emergency department at Rambam Health Care Campus. The system analyzes structured patient data from the electronic health record and generates diagnostic and treatment recommendations for physicians.

The study will assess the system's ability to support diagnostic reasoning, its impact on diagnostic accuracy when used by physicians, and its perceived clinical usefulness. In addition, a retrospective analysis of de-identified patient records will be conducted to compare LLM-generated recommendations with actual clinical outcomes, including diagnosis, disposition decisions, and length of stay.

The study will also examine the performance of the system in a multilingual clinical environment where both Hebrew and English are used in medical documentation and communication.

Study Overview

Detailed Description

This is a mixed-methods study combining a prospective controlled component and a retrospective chart review.

Prospective Component

  • Setting: Emergency Department, Rambam Health Care Campus
  • The LLM will receive structured patient input (chief complaint, vitals, relevant history, laboratory and imaging results) via a secure interface.
  • LLM-generated recommendations will be logged and made available to the treating physician; final clinical decisions remain entirely with the physician.
  • The system operates in decision-support mode only it does not autonomously initiate any clinical action.

Retrospective Component

• De-identified historical ED records will be used to evaluate LLM performance against documented clinical outcomes.

Primary metrics: diagnostic concordance, appropriateness of suggested workup, and disposition accuracy.

Study Type

Observational

Enrollment (Estimated)

20000

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

      • Haifa, Israel, 3109601
        • Rambam Healthcare Campus

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult
  • Older Adult

Accepts Healthy Volunteers

No

Sampling Method

Non-Probability Sample

Study Population

All adult patients (≥18 years) receiving care in emergency departments wings A and B

Description

Inclusion Criteria:

Adults ≥ 18 presented to the ER

Exclusion Criteria:

None

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Cohorts and Interventions

Group / Cohort
Evaluation With AI
A scenario in which the physician receives real-time recommendations only from the model before making the final decision (the final decision will be called on the basis of senior attending, and the treating physician)
Evaluation Without AI
A scenario in which the physician is not exposed to the model's recommendations.

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Length of Stay in Emergency Department
Time Frame: From ED registration until discharge from the emergency department or admission to a hospital ward, assessed up to 24 hours
Time from ED registration to discharge from emergency department or admission to a hospital ward, focusing in addition on consultation cycle time.
From ED registration until discharge from the emergency department or admission to a hospital ward, assessed up to 24 hours

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Cost of Running the LLM
Time Frame: Up to 3 years

Average daily cost (in USD) to operate the large language model.

Description:

This outcome will measure the direct computational and licensing costs incurred by running the LLM in the clinical setting.

Unit of Measure: US Dollars (per day). How It Is Assessed: The total daily spend for the LLM (e.g., cloud-compute fees, licensing fees) will be recorded and divided by the total number of patient encounters that day to derive an average cost.

Up to 3 years
Staff Compliance With AI
Time Frame: Up to 3 years

This outcome will assess how often (compliance) and how willingly (acceptance) physicians, residents, and nurses use the LLM in eligible clinical scenarios.

Description: Determines how often physicians, residents, and nurses use the LLM for eligible patient encounters.

Unit of Measure: Proportion (0.0-1.0) or Percent (0-100%) How Assessed: Tracks the number of encounters in which the LLM is actually used, divided by the total number of eligible encounters. Higher percentages indicate greater compliance.

Up to 3 years
Transparency/Explainability
Time Frame: 3 years

Description: Evaluates how clearly the LLM's reasoning is communicated to clinical staff.

Unit of Measure: Score on a 5-point Likert scale (1 = not at all clear, 5 = extremely clear) How Assessed: After each LLM-guided decision, the resident or attending physician rates the clarity of the model's explanation. A mean score is reported. Higher scores indicate better clarity.

3 years
Quality of LLM-Generated Clinical Reports
Time Frame: Up to 3 years

Accuracy and Completeness Score of the LLM-Generated Clinical Report This outcome evaluates how accurately and comprehensively the LLM summarizes a patient's clinical encounter and recommended plan in a structured report (e.g., discharge instructions, progress notes).

Unit(s) of Measure: Quality Score based on a modified SOAP (Subjective, Objective, Assessment, Plan) Note Rating Scale, ranging from 0 to 10, where higher scores indicate better documentation quality. The score reflects the inclusion of required elements and correctness of clinical details.

Assessment Method: Each LLM-generated report is independently evaluated by a panel of attending physicians or trained clinical staff. Using a standardized checklist, reviewers assess completeness (e.g., inclusion of symptoms, physical findings, diagnosis, and plan) and accuracy (e.g., correct medications, identifiers, and diagnoses), then assign a score based on the predefined rubric.

Up to 3 years
Staff Acceptance of AI
Time Frame: Up to 3 years

Description: Evaluates how willingly staff integrate the LLM into their clinical workflow.

Unit of Measure: 5-point Likert scale (1 = strongly disagree, 5 = strongly agree) How Assessed: After each LLM-assisted encounter, staff complete a brief survey about perceived helpfulness and willingness to reuse the LLM. Higher scores indicate greater acceptance.

Up to 3 years

Other Outcome Measures

Outcome Measure
Measure Description
Time Frame
The study is organized around four pre-specified aims:
Time Frame: 3 years
Aim 1:LLM Diagnostic & Treatment Recommendation Appropriateness Appropriateness of LLM recommendations rated by senior clinicians (1=inappropriate, 5=appropriate) Timeframe:ED registration to discharge or inpatient admission, up to 24h Aim 2:Diagnostic Accuracy Rate- LLM-Assisted vs. Standard Care Clinical Decision-Making Proportion of correct diagnoses in LLM-assisted vs. standard care (%), matched to discharge diagnosis Timeframe:ED registration to final diagnosis, up to 24h Aim 3:Clinician-Rated Utility & Usability of LLM Outputs- SUS and Likert Scale Utility measured via SUS (0-100) and 5-point Likert rating, collected post-encounter with qualitative feedback Timeframe:End of each clinical encounter,up to 36 months Aim 4:LLM Retrospective Benchmark-Percent Agreement & Cohen's Kappa vs. Actual Clinical Outcomes Agreement between LLM recommendations and actual outcomes (diagnosis, disposition, LOS) in de-identified records Timeframe:Records up to 36 months prior to study initiation
3 years

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Investigators

  • Principal Investigator: Shahar Shelly, MD, Rambam Health Care Campus

Publications and helpful links

The person responsible for entering information about the study voluntarily provides these publications. These may be about anything related to the study.

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

January 1, 2000

Primary Completion (Estimated)

September 1, 2026

Study Completion (Estimated)

September 1, 2026

Study Registration Dates

First Submitted

November 13, 2024

First Submitted That Met QC Criteria

March 27, 2025

First Posted (Actual)

March 30, 2025

Study Record Updates

Last Update Posted (Actual)

April 21, 2026

Last Update Submitted That Met QC Criteria

April 16, 2026

Last Verified

April 1, 2026

More Information

Terms related to this study

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

NO

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Clinical Decision-making

Clinical Trials on Artificial Intelligence

Subscribe