Can Feedback From a Large Language Model Improve Health Care Quality?

January 30, 2026 updated by: Jason Abaluck, Yale University

A Pilot Ptudy of an LLM Tool to Support Frontline Health Workers in Low-Resource Settings

The goal of this study is to learn if computer-assisted advice can help improve patient care in Nigerian health clinics. The main question it aims to answer is: does giving healthcare workers instant computer feedback help them make better decisions about patient care?

Researchers will compare patient care notes written by healthcare workers before and after they receive computer feedback to see if the feedback improves care quality. A doctor who doesn't know if feedback was given will review these notes.

Participants will:

  • Be seen by a community healthcare worker who uses the computer feedback system
  • Be treated by a fully trained medical doctor
  • Get tested for malaria, anemia, or urinary tract infections if they have certain symptoms

Study Overview

Status

Completed

Conditions

Detailed Description

This project tests whether Large Language Models (LLMs) can improve patient care in Nigerian primary care clinics by giving customized and instant feedback to the provider in natural language. An LLM-based tool integrated into an electronic patient record management system provides "second opinions" to community health extension workers (CHEWs) at two clinics in Nigeria. These second opinions are intended to mirror what a reviewing physician might advise the CHEWs after seeing or hearing their initial report on a patient.

For the main analysis, this study employs a within-patient comparison of two patient notes created by the CHEW; one during the initial patient consultation, and one after the LLM feedback was received. The patient is also seen by a fully trained medical officer who is in charge of patient care. The MO conducts a blinded review of the CHEW's patient notes to measures changes in the CHEW's care as a result of the LLM feedback. The data comes from the information captured in the electronic medical record (EMR) of the patient and from survey data collected from CHEWs, reviewing MOs, and a panel of reviewing Medical Doctors.

Study Type

Interventional

Enrollment (Actual)

491

Phase

  • Not Applicable

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

    • Kano State
      • Kano, Kano State, Nigeria
        • EHA Clinics REACH Community Clinic, Gyadi Gyadi
      • Kano, Kano State, Nigeria
        • EHA Clinics, 33 Lamido Crescent

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Child
  • Adult
  • Older Adult

Accepts Healthy Volunteers

No

Description

Inclusion Criteria:

  • Patient is at the clinic for outpatient consultation
  • Parent/guardian consent is required for individuals under 18

Exclusion Criteria:

  • Patient does not require emergency care
  • Patient is not at the clinic for a checkup (e.g. weight, blood pressure, follow up after recovery)
  • Patient is not a trauma patient (visit is not for an accident, wound or injury)
  • Patient is not at the clinic for a scheduled procedure or a birth

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

  • Primary Purpose: Health Services Research
  • Allocation: N/A
  • Interventional Model: Single Group Assignment
  • Masking: None (Open Label)

Arms and Interventions

Participant Group / Arm
Intervention / Treatment
Experimental: Clinical Assessment with and without LLMs
The investigators employ a within-patient design. Patients receive two sequential assessments from a Community Health Extension Worker: first without and then with Large Language Model assistance.
A Large Language Model (LLM) integrated into the clinic's Electronic Medical Record system provides real-time feedback on patient assessments. Community Health Extension Workers first create a standard SOAP note, submit it to the LLM, and receive detailed feedback and key recommendations. They can then update their assessment based on this feedback. All final treatment decisions are made by Medical Officers who independently evaluate patients.

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Indicator for an Error in the Treatment plan (with the Potential for Harm)
Time Frame: Through study completion, an average of six months

During SOAP note evaluation, the MO is asked to indicate whether the treatment plan for the patient contains any errors, conditional on the MO's own diagnosis. This is coded as 1 if the MO indicates there is an error and 0 otherwise.

The introductory text (here for SOAP Note A) is: Please evaluate whether the treatment in SOAP Note A is appropriate for this patient's condition. Please base this on your own diagnosis, not the CHEW's diagnosis in SOAP Note A.

This is followed by the question: Is the treatment plan for the patient in SOAP Note A completely appropriate given your own diagnosis (accounting for conditional treatments based on medical tests)? Answer "No" if the patient should receive different medical care given your diagnosis. This can include both minor differences (for example, the patient should be advised to rest) and major errors (for example, the patient should receive a completely different set of medications). (Answer options: yes/no/unsure)

Through study completion, an average of six months
Indicator for an Error in the Treatment Plan that Causes a Loss of at least X Quality-Adjusted Life Days
Time Frame: Through study completion, an average of six months
This variable is coded as 1 if the MO indicates there is such an error and 0 otherwise. X is defined to be the highest benchmark on the appropriate DALY scale so that at least 5% of patients have an error that large in the unassisted SOAP note. In other words, severe errors are any errors that generate a harm rating at or above the 95th percentile of harm on the unassisted scale (pooling child and adult scales).
Through study completion, an average of six months
Indicator for the Better Treatment Plan (as Determined by the MOs)
Time Frame: Through study completion, an average of six months
Based on the DALY rating of SOAP Note A vs. B (counting instances with no errors as 0 DALY loss), the indicator is coded as 1 if the SOAP note has the better treatment plan (lower DALY loss) and 0 if MOs judge both notes to be the same in response to the following question: Are there any meaningful differences in the treatment plans of SOAP Note A and B?
Through study completion, an average of six months
Indicator for whether Treatment is Consistent with a Predetermined "Standard of Care"
Time Frame: Through study completion, an average of six months

At-risk patients receive malaria, anemia and UTI screening in accordance with certain demographic criteria. A dataset is then constructed with one observation for each (patient, screening test, note), up to six per patient.

The indicator of treatment misallocation records whether a patient was incorrectly treated for a condition based on the test result or lack of symptoms. The variable is coded as 1 if the patient tested positive and either received inappropriate or no treatment. It is also coded as 1 if the patient tested negative or was not tested based on the symptom screen but received treatment for the condition. The variable is only coded as 0 if the patient tested negative and was correctly not treated for the corresponding condition, or if they tested positive and received the correct treatment.

Through study completion, an average of six months

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Indicators Denoting Diagnosis and Treatment Alignment Between CHEWs and MOs
Time Frame: Through study completion, an average of six months

For each medication in the CHEW's treatment plan, there is a "clinical indication" (the diagnosis associated with the drug) along with an indicator that specifies if a given prescription is conditional on a medical test result. The research team will consider three indicators of a match:

  • any match of the contents of the "clinical indication" field across medications;
  • any match of the contents of the "medication" field across indications, including whether the medication is conditional on a test or not;
  • a match of both medication and indication (and test conditionality).
Through study completion, an average of six months
Alternative Indicators for Treatment Misallocation
Time Frame: Through study completion, an average of six months

The research team will construct the following indicators of treatment misallocation:

  • Misallocation due to overprescription: a condition is treated that the patient is confirmed not to have
  • Misallocation due to underprescription: a condition the patient is confirmed to have is not treated
  • Misallocation due to incorrect dosing or drug choice: a condition the patient has is treated but the dosing or medication chosen is inappropriate.
Through study completion, an average of six months
Relationship of QALY Loss to Severity of Patient Condition
Time Frame: Through study completion, an average of six months

In patients with only mild illnesses, the scope for QALY loss from mistakes may be limited relative to patients with more severe illnesses.

With this in mind, QALY loss is regressed on indicators for mild, moderate, and severe illnesses (as assessed by the MO) each interacted with the assisted note indicator, controlling for patient fixed effects. Results will be shown graphically.

Through study completion, an average of six months
Indicators for the Appropriateness of Medical Testing Decisions
Time Frame: Through study completion, an average of six months

The potential misallocation of medical testing is operationalized in two ways:

  1. For each test type, the research team will construct an indicator that is coded as 1 if the CHEW recommends conducting a test that turns out to be negative, and a second indicator that is 1 if the CHEW neglects to request a test that turns out to be positive.
  2. Second, the research team will construct an indicator at the level of (patient, test, note) that measures whether the CHEW and MO requested the same or a comparable medical test (e.g. the CHEW requested a malaria RDT whereas the MO requested a malaria bloodsmear).

Combining these indicators, a mismatch occurs if and only if either: i) a test was not requested by the CHEW but was positive, or ii) the test was requested by the CHEW but the result was negative and no equivalent test was ordered by the MO.

Through study completion, an average of six months
Average and Distribution of DALY Lost
Time Frame: Through study completion, an average of six months
The effect of LLM assistance DALY lost is measured directly rather than indirectly (as in probability of error and severe error, which note is the better note). The full distribution of DALY ratings for the assisted and unassisted notes will also be shown in the results.
Through study completion, an average of six months
MO Evaluation of SOAP Notes: Deviations from the MO's SOAP
Time Frame: Through study completion, an average of six months
The MO is asked to assess for each SOAP note whether medical tests ordered were necessary or clinically useful, whether there are missing or incorrect/unnecessary diagnoses, and whether there are missing or incorrect/unnecessary treatment plan elements.
Through study completion, an average of six months
MO Evaluation of SOAP Notes: Types of Harm Incurred
Time Frame: Through study completion, an average of six months
The MO is asked to assess any short-term harm (additional symptoms or discomfort for some period), and any long-term serious harm (risk of impairment, death etc.) from the treatment plan in the SOAP note.
Through study completion, an average of six months
MO Evaluation of SOAP Notes: Measuring Healthy Time Lost in DALY
Time Frame: Through study completion, an average of six months
The MO also provides an overall rating that is intended to reflect the "healthy time lost" from any errors in treatment in the SOAP note. For each assessment and plan constructed by a CHEW (with or without LLM advice), an MO will assess the expected magnitude of healthy life that would be lost if the CHEW plan were implemented instead of the MO's plan.
Through study completion, an average of six months
MD Evaluation of CHEW and MO Notes: Flagging MO Error
Time Frame: Through study completion, an average of six months
In a first step, they will review the MO notes only and record whether there is any error in the diagnosis or treatment proposed in the conditional note or in the final note. If an error is identified the MDs will rate the error by severity to distinguish medical mistakes from differences in opinion about a patient who is not present.
Through study completion, an average of six months
MD Evaluation of CHEW and MO Notes: SOAP Note Rating
Time Frame: Through study completion, an average of six months

The MD is asked to assess any short-term harm (additional symptoms or discomfort for some period), and any long-term serious harm (risk of impairment, death etc.) from the treatment plan in the SOAP note.

The MD also provides an overall rating that is intended to reflect the "healthy time lost" from any errors in treatment in the SOAP note. For each assessment and plan constructed by a CHEW (with or without LLM advice), an MO will assess the expected magnitude of healthy life that would be lost if the CHEW plan were implemented instead of the MO's plan. Healthy time is measured in units of disability-adjusted life year (DALYs), which reflect both length and quality of life.

Through study completion, an average of six months
MD Evaluation of CHEW and MO Notes: LLM Review
Time Frame: Through study completion, an average of six months

The MDs will also review the LLM feedback and answer the following questions:

"Did the CHEW follow all, some, or none of the LLM recommendations?" If some or none: "Imagine the CHEW had followed all the recommendations of the LLM. Would the resulting treatment plan be an improvement over their assisted note?" (Yes/no) If yes: "Please explain."" "Did the LLM make any mistakes?" (Yes/no) If yes: "Was any aspect of the CHEW's assisted treatment plan worse than the unassisted plan because the CHEW followed the LLM's erroneous recommendation?" If yes: "Please explain.

Through study completion, an average of six months
Indicator for the Appropriateness of Triage Decisions
Time Frame: Through study completion, an average of six months
For each (patient, note), an indicator records whether the CHEW triage decision (an intent to triage indicated in the SOAP note) and the MO suggested triage decision align.
Through study completion, an average of six months

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Sponsor

Investigators

  • Principal Investigator: Jason Abaluck, Yale University

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

January 30, 2025

Primary Completion (Actual)

October 17, 2025

Study Completion (Actual)

October 17, 2025

Study Registration Dates

First Submitted

January 23, 2025

First Submitted That Met QC Criteria

February 6, 2025

First Posted (Actual)

February 12, 2025

Study Record Updates

Last Update Posted (Actual)

February 3, 2026

Last Update Submitted That Met QC Criteria

January 30, 2026

Last Verified

April 1, 2025

More Information

Terms related to this study

Other Study ID Numbers

  • 2000035990

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

YES

IPD Plan Description

The following de-identified individual participant data (IDP) will be shared:

Patient demographics and vitals Symptoms and clinical findings documented by CHEWs and MOs Test results (malaria, anemia, UTI) Treatment plans and prescriptions SOAP notes with and without LLM assistance from both CHEWs and MOs Provider assessments and DALY ratings Survey responses from CHEWs and MD panel reviews

IPD Sharing Time Frame

Data will be available to other researchers beginning 3 months after publication and will remain available with no end date.

IPD Sharing Access Criteria

Academic researchers with a formal appointment at a research institution must submit a research proposal detailing intended analyses and sign a data use agreement.

IPD Sharing Supporting Information Type

  • STUDY_PROTOCOL
  • SAP
  • ICF
  • ANALYTIC_CODE
  • CSR

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on All Conditions

Clinical Trials on Large Language Model Clinical Decision Support

Subscribe