Can Feedback From a Large Language Model Improve Health Care Quality?

January 30, 2026 updated by: Jason Abaluck, Yale University

A Pilot Ptudy of an LLM Tool to Support Frontline Health Workers in Low-Resource Settings

The goal of this study is to learn if computer-assisted advice can help improve patient care in Nigerian health clinics. The main question it aims to answer is: does giving healthcare workers instant computer feedback help them make better decisions about patient care?

Researchers will compare patient care notes written by healthcare workers before and after they receive computer feedback to see if the feedback improves care quality. A doctor who doesn't know if feedback was given will review these notes.

Participants will:

Be seen by a community healthcare worker who uses the computer feedback system
Be treated by a fully trained medical doctor
Get tested for malaria, anemia, or urinary tract infections if they have certain symptoms

Study Overview

Status

Completed

Conditions

All Conditions

Intervention / Treatment

Other: Large Language Model Clinical Decision Support

Detailed Description

This project tests whether Large Language Models (LLMs) can improve patient care in Nigerian primary care clinics by giving customized and instant feedback to the provider in natural language. An LLM-based tool integrated into an electronic patient record management system provides "second opinions" to community health extension workers (CHEWs) at two clinics in Nigeria. These second opinions are intended to mirror what a reviewing physician might advise the CHEWs after seeing or hearing their initial report on a patient.

For the main analysis, this study employs a within-patient comparison of two patient notes created by the CHEW; one during the initial patient consultation, and one after the LLM feedback was received. The patient is also seen by a fully trained medical officer who is in charge of patient care. The MO conducts a blinded review of the CHEW's patient notes to measures changes in the CHEW's care as a result of the LLM feedback. The data comes from the information captured in the electronic medical record (EMR) of the patient and from survey data collected from CHEWs, reviewing MOs, and a panel of reviewing Medical Doctors.

Study Type

Interventional

Enrollment (Actual)

491

Phase

Not Applicable

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

Nigeria
- Kano State
  - Kano, Kano State, Nigeria
    - EHA Clinics REACH Community Clinic, Gyadi Gyadi
  - Kano, Kano State, Nigeria
    - EHA Clinics, 33 Lamido Crescent

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

Child
Adult
Older Adult

Accepts Healthy Volunteers

Description

Inclusion Criteria:

Patient is at the clinic for outpatient consultation
Parent/guardian consent is required for individuals under 18

Exclusion Criteria:

Patient does not require emergency care
Patient is not at the clinic for a checkup (e.g. weight, blood pressure, follow up after recovery)
Patient is not a trauma patient (visit is not for an accident, wound or injury)
Patient is not at the clinic for a scheduled procedure or a birth

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Primary Purpose: Health Services Research
Allocation: N/A
Interventional Model: Single Group Assignment
Masking: None (Open Label)

Number of Arms

Arms and Interventions

Participant Group / Arm	Intervention / Treatment
Experimental: Clinical Assessment with and without LLMs The investigators employ a within-patient design. Patients receive two sequential assessments from a Community Health Extension Worker: first without and then with Large Language Model assistance.	Other: Large Language Model Clinical Decision Support A Large Language Model (LLM) integrated into the clinic's Electronic Medical Record system provides real-time feedback on patient assessments. Community Health Extension Workers first create a standard SOAP note, submit it to the LLM, and receive detailed feedback and key recommendations. They can then update their assessment based on this feedback. All final treatment decisions are made by Medical Officers who independently evaluate patients.

Participant Group / Arm

Intervention / Treatment

Experimental: Clinical Assessment with and without LLMs

The investigators employ a within-patient design. Patients receive two sequential assessments from a Community Health Extension Worker: first without and then with Large Language Model assistance.

Other: Large Language Model Clinical Decision Support

A Large Language Model (LLM) integrated into the clinic's Electronic Medical Record system provides real-time feedback on patient assessments. Community Health Extension Workers first create a standard SOAP note, submit it to the LLM, and receive detailed feedback and key recommendations. They can then update their assessment based on this feedback. All final treatment decisions are made by Medical Officers who independently evaluate patients.

What is the study measuring?

Primary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Indicator for an Error in the Treatment plan (with the Potential for Harm) Time Frame: Through study completion, an average of six months	During SOAP note evaluation, the MO is asked to indicate whether the treatment plan for the patient contains any errors, conditional on the MO's own diagnosis. This is coded as 1 if the MO indicates there is an error and 0 otherwise. The introductory text (here for SOAP Note A) is: Please evaluate whether the treatment in SOAP Note A is appropriate for this patient's condition. Please base this on your own diagnosis, not the CHEW's diagnosis in SOAP Note A. This is followed by the question: Is the treatment plan for the patient in SOAP Note A completely appropriate given your own diagnosis (accounting for conditional treatments based on medical tests)? Answer "No" if the patient should receive different medical care given your diagnosis. This can include both minor differences (for example, the patient should be advised to rest) and major errors (for example, the patient should receive a completely different set of medications). (Answer options: yes/no/unsure)	Through study completion, an average of six months
Indicator for an Error in the Treatment Plan that Causes a Loss of at least X Quality-Adjusted Life Days Time Frame: Through study completion, an average of six months	This variable is coded as 1 if the MO indicates there is such an error and 0 otherwise. X is defined to be the highest benchmark on the appropriate DALY scale so that at least 5% of patients have an error that large in the unassisted SOAP note. In other words, severe errors are any errors that generate a harm rating at or above the 95th percentile of harm on the unassisted scale (pooling child and adult scales).	Through study completion, an average of six months
Indicator for the Better Treatment Plan (as Determined by the MOs) Time Frame: Through study completion, an average of six months	Based on the DALY rating of SOAP Note A vs. B (counting instances with no errors as 0 DALY loss), the indicator is coded as 1 if the SOAP note has the better treatment plan (lower DALY loss) and 0 if MOs judge both notes to be the same in response to the following question: Are there any meaningful differences in the treatment plans of SOAP Note A and B?	Through study completion, an average of six months
Indicator for whether Treatment is Consistent with a Predetermined "Standard of Care" Time Frame: Through study completion, an average of six months	At-risk patients receive malaria, anemia and UTI screening in accordance with certain demographic criteria. A dataset is then constructed with one observation for each (patient, screening test, note), up to six per patient. The indicator of treatment misallocation records whether a patient was incorrectly treated for a condition based on the test result or lack of symptoms. The variable is coded as 1 if the patient tested positive and either received inappropriate or no treatment. It is also coded as 1 if the patient tested negative or was not tested based on the symptom screen but received treatment for the condition. The variable is only coded as 0 if the patient tested negative and was correctly not treated for the corresponding condition, or if they tested positive and received the correct treatment.	Through study completion, an average of six months

Secondary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Indicators Denoting Diagnosis and Treatment Alignment Between CHEWs and MOs Time Frame: Through study completion, an average of six months	For each medication in the CHEW's treatment plan, there is a "clinical indication" (the diagnosis associated with the drug) along with an indicator that specifies if a given prescription is conditional on a medical test result. The research team will consider three indicators of a match: any match of the contents of the "clinical indication" field across medications; any match of the contents of the "medication" field across indications, including whether the medication is conditional on a test or not; a match of both medication and indication (and test conditionality).	Through study completion, an average of six months
Alternative Indicators for Treatment Misallocation Time Frame: Through study completion, an average of six months	The research team will construct the following indicators of treatment misallocation: Misallocation due to overprescription: a condition is treated that the patient is confirmed not to have Misallocation due to underprescription: a condition the patient is confirmed to have is not treated Misallocation due to incorrect dosing or drug choice: a condition the patient has is treated but the dosing or medication chosen is inappropriate.	Through study completion, an average of six months
Relationship of QALY Loss to Severity of Patient Condition Time Frame: Through study completion, an average of six months	In patients with only mild illnesses, the scope for QALY loss from mistakes may be limited relative to patients with more severe illnesses. With this in mind, QALY loss is regressed on indicators for mild, moderate, and severe illnesses (as assessed by the MO) each interacted with the assisted note indicator, controlling for patient fixed effects. Results will be shown graphically.	Through study completion, an average of six months
Indicators for the Appropriateness of Medical Testing Decisions Time Frame: Through study completion, an average of six months	The potential misallocation of medical testing is operationalized in two ways: For each test type, the research team will construct an indicator that is coded as 1 if the CHEW recommends conducting a test that turns out to be negative, and a second indicator that is 1 if the CHEW neglects to request a test that turns out to be positive. Second, the research team will construct an indicator at the level of (patient, test, note) that measures whether the CHEW and MO requested the same or a comparable medical test (e.g. the CHEW requested a malaria RDT whereas the MO requested a malaria bloodsmear). Combining these indicators, a mismatch occurs if and only if either: i) a test was not requested by the CHEW but was positive, or ii) the test was requested by the CHEW but the result was negative and no equivalent test was ordered by the MO.	Through study completion, an average of six months
Average and Distribution of DALY Lost Time Frame: Through study completion, an average of six months	The effect of LLM assistance DALY lost is measured directly rather than indirectly (as in probability of error and severe error, which note is the better note). The full distribution of DALY ratings for the assisted and unassisted notes will also be shown in the results.	Through study completion, an average of six months
MO Evaluation of SOAP Notes: Deviations from the MO's SOAP Time Frame: Through study completion, an average of six months	The MO is asked to assess for each SOAP note whether medical tests ordered were necessary or clinically useful, whether there are missing or incorrect/unnecessary diagnoses, and whether there are missing or incorrect/unnecessary treatment plan elements.	Through study completion, an average of six months
MO Evaluation of SOAP Notes: Types of Harm Incurred Time Frame: Through study completion, an average of six months	The MO is asked to assess any short-term harm (additional symptoms or discomfort for some period), and any long-term serious harm (risk of impairment, death etc.) from the treatment plan in the SOAP note.	Through study completion, an average of six months
MO Evaluation of SOAP Notes: Measuring Healthy Time Lost in DALY Time Frame: Through study completion, an average of six months	The MO also provides an overall rating that is intended to reflect the "healthy time lost" from any errors in treatment in the SOAP note. For each assessment and plan constructed by a CHEW (with or without LLM advice), an MO will assess the expected magnitude of healthy life that would be lost if the CHEW plan were implemented instead of the MO's plan.	Through study completion, an average of six months
MD Evaluation of CHEW and MO Notes: Flagging MO Error Time Frame: Through study completion, an average of six months	In a first step, they will review the MO notes only and record whether there is any error in the diagnosis or treatment proposed in the conditional note or in the final note. If an error is identified the MDs will rate the error by severity to distinguish medical mistakes from differences in opinion about a patient who is not present.	Through study completion, an average of six months
MD Evaluation of CHEW and MO Notes: SOAP Note Rating Time Frame: Through study completion, an average of six months	The MD is asked to assess any short-term harm (additional symptoms or discomfort for some period), and any long-term serious harm (risk of impairment, death etc.) from the treatment plan in the SOAP note. The MD also provides an overall rating that is intended to reflect the "healthy time lost" from any errors in treatment in the SOAP note. For each assessment and plan constructed by a CHEW (with or without LLM advice), an MO will assess the expected magnitude of healthy life that would be lost if the CHEW plan were implemented instead of the MO's plan. Healthy time is measured in units of disability-adjusted life year (DALYs), which reflect both length and quality of life.	Through study completion, an average of six months
MD Evaluation of CHEW and MO Notes: LLM Review Time Frame: Through study completion, an average of six months	The MDs will also review the LLM feedback and answer the following questions: "Did the CHEW follow all, some, or none of the LLM recommendations?" If some or none: "Imagine the CHEW had followed all the recommendations of the LLM. Would the resulting treatment plan be an improvement over their assisted note?" (Yes/no) If yes: "Please explain."" "Did the LLM make any mistakes?" (Yes/no) If yes: "Was any aspect of the CHEW's assisted treatment plan worse than the unassisted plan because the CHEW followed the LLM's erroneous recommendation?" If yes: "Please explain.	Through study completion, an average of six months
Indicator for the Appropriateness of Triage Decisions Time Frame: Through study completion, an average of six months	For each (patient, note), an indicator records whether the CHEW triage decision (an intent to triage indicated in the SOAP note) and the MO suggested triage decision align.	Through study completion, an average of six months

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Sponsor

Yale University

Collaborators

University of Pennsylvania

George Washington University

World Bank

EHA Clinics Nigeria

Investigators

Principal Investigator: Jason Abaluck, Yale University

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

January 30, 2025

Primary Completion (Actual)

October 17, 2025

Study Completion (Actual)

October 17, 2025

Study Registration Dates

First Submitted

January 23, 2025

First Submitted That Met QC Criteria

February 6, 2025

First Posted (Actual)

February 12, 2025

Study Record Updates

Last Update Posted (Actual)

February 3, 2026

Last Update Submitted That Met QC Criteria

January 30, 2026

Last Verified

April 1, 2025

More Information

Terms related to this study

Keywords

Other Study ID Numbers

2000035990

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

YES

IPD Plan Description

The following de-identified individual participant data (IDP) will be shared:

Patient demographics and vitals Symptoms and clinical findings documented by CHEWs and MOs Test results (malaria, anemia, UTI) Treatment plans and prescriptions SOAP notes with and without LLM assistance from both CHEWs and MOs Provider assessments and DALY ratings Survey responses from CHEWs and MD panel reviews

IPD Sharing Time Frame

Data will be available to other researchers beginning 3 months after publication and will remain available with no end date.

IPD Sharing Access Criteria

Academic researchers with a formal appointment at a research institution must submit a research proposal detailing intended analyses and sign a data use agreement.

IPD Sharing Supporting Information Type

STUDY_PROTOCOL
SAP
ICF
ANALYTIC_CODE
CSR

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on All Conditions

Academisch Medisch Centrum - Universiteit van Amsterdam...

Not yet recruiting

EMpower Parents: Effectiveness of EMDR Treatment for Parental PTSD Related to Their Child's Medical Condition. (EMpower)

All Acute Pediatric Conditions | All Chronic Pediatric Conditions
DKMS gemeinnützige GmbH

Completed

Haploidentical Donor vs mMUD in Hematological Malignancies (HAMLET)

AML | MDS | ALL

Germany
Tata Medical Center

Completed

Comprehensive Digital Archive of Cancer Imaging-Radiation Oncology( CHAVI-RO )

All Cancer

India
Akdeniz University

Completed

Squishy Toy and Palpation in Pediatric IV Success (Intravenous)

Pediatric ALL

Turkey
University of Texas at Austin

Withdrawn

Difficult PIV Placement in the Pediatric ED

Pediatric ALL
YiLin Zhao
Henan Provincial People's Hospital; Wuhan Children's Hospital

Enrolling by invitation

Exploring the Minimum Effective Concentration and Volume of Ropivacaine for Sacral Plexus Anesthesia

Pediatric ALL

China
Eunah Cho, MD
Samsung Medical Center

Completed

Reducing Fasting Time in Children for Sedation

Pediatric ALL

Korea, Republic of
Université du Québec à Trois-Rivières
Fondation Chiropratique du Québec; Canadian Chiropractic Research Fondation

Completed

Measurement of Spinal Mobilization in Pediatric Population

Pediatric ALL

Canada
MultiCare Health System Research Institute

Completed

Pediatric Normative Movement Analysis Data Collection

Pediatric ALL

United States
Ain Shams University

Completed

Dexamethasone & Ketamine as Adjuvants to Bupivacaine for Incisional Infiltration in Pediatric Abdominal Operations

Pediatric ALL

Egypt

Clinical Trials on Large Language Model Clinical Decision Support

MetroWest Artificial Intelligence Research Workgroup

Not yet recruiting

Point-of-Care AI Assistance and Critical Care Outcomes: A Randomized Trial (POC-AI-ICU)

Sepsis | Shock | Critical Illness | Acute Kidney Injury | Delirium Confusional State | Multi-organ Failure | Acute Respiratory Failure (ARF)

United States
Tsinghua University

Not yet recruiting

A Large Language Model in Outpatient Care

Outpatient Care
Second Affiliated Hospital of Nanchang University
First Affiliated Hospital of Zhejiang University; Renmin Hospital of Wuhan... and other collaborators

Recruiting

Hemorrhage Stroke Decision Making Model Based Deep Learning (BrainHemoAI System)

Hemorrhage Stroke

China
Capital Medical University

Completed

Application of Large Language Models in Emergency Neurology

Emergency | Neurology

China
Shandong Cancer Hospital and Institute

Not yet recruiting

Concordance Between Large Language Model and Multidisciplinary Team Recommendations in Rectal Cancer

Rectal Cancer

China
First Affiliated Hospital of Wenzhou Medical University

Recruiting

Large Language Model-Assisted cTNM Annotation From Chinese PSMA PET/CT Reports (PSMA-LLM-cTNM)

Prostate Cancer

China
Zhongshan Ophthalmic Center, Sun Yat-sen University

Completed

Evaluate the Performance of Large Language Models in Ophthalmologic Patient Consultation

Non-emergency Ocular Diseases

China
John J Chen

Completed

Enhancing Interdisciplinary Understanding of Ophthalmology Notes Through a Local Large Language Model

Communication | Interdisciplinary Communication | Artificial Intelligence (AI) | Artificial Intelligence Technology

United States
Hospital of the Ministry of Interior, Kielce, Poland

Not yet recruiting

Large Language Models for Dental Radiology Report Generation From Structured Textual Data (DENT-LLM)

Oral Health | Radiography | Dentistry | Large Language Model | Natural Language Processing (NLP)

Poland
First Affiliated Hospital of Fujian Medical University

Recruiting

Evaluation of AI Large Models for Diagnosis and Treatment in Real-World Cases: Multicenter Retrospective Study

Urologic Diseases

China

Can Feedback From a Large Language Model Improve Health Care Quality?

A Pilot Ptudy of an LLM Tool to Support Frontline Health Workers in Low-Resource Settings

Study Overview

Status

Conditions

Intervention / Treatment

Detailed Description

Study Type

Enrollment (Actual)

Phase

Contacts and Locations

Study Locations

Participation Criteria

Eligibility Criteria

Ages Eligible for Study

Accepts Healthy Volunteers

Description

Study Plan

How is the study designed?

Design Details

Number of Arms

Arms and Interventions

Participant Group / Arm

Intervention / Treatment

What is the study measuring?

Primary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Secondary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Collaborators and Investigators

Sponsor

Collaborators

Investigators

Study record dates

Study Major Dates

Study Start (Actual)

Primary Completion (Actual)

Study Completion (Actual)

Study Registration Dates

First Submitted

First Submitted That Met QC Criteria

First Posted (Actual)

Study Record Updates

Last Update Posted (Actual)

Last Update Submitted That Met QC Criteria

Last Verified

More Information

Terms related to this study

Keywords

Other Study ID Numbers

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

IPD Plan Description

IPD Sharing Time Frame

IPD Sharing Access Criteria

IPD Sharing Supporting Information Type

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

Clinical Trials on All Conditions

Clinical Trials on Large Language Model Clinical Decision Support

Search Similar Trials

Sponsors and Collaborators

Medical Conditions

Drug Interventions

CROs by country

CROs in India

Conditions

Rare Diseases

Drug Interventions

Dietary Supplements

Sponsor/Collaborators

Locations