Improving the Reliability of LLMs as Medical Assistants for the General Public (LAMP-1)

June 11, 2026 updated by: Ji Xunming,MD,PhD, Capital Medical University

Improving the Reliability of LLMs as Medical Assistants for the General Public: a Proof of Concept Simulation Trial

This study will evaluate whether three-minute six-dimensions education(3M-6D education) can improve the reliability of large language models as medical assistants for the general public. Participants will be randomly assigned to receive or not receive 3M-6D education and then use ChatGPT, Gemini, or non-AI information resources. The study will assess relevant condition identification, disposition concordance, red-flag identification, and NASA-TLX score.

Study Overview

Status

Not yet recruiting

Conditions

Relevant Conditions Identification

Intervention / Treatment

Detailed Description

This randomized, controlled, proof-of-concept simulation trial will evaluate whether three-minute six-dimensions education (3M-6D education) can improve the reliability of large language models as medical assistants for the general public.

Eligible participants will be randomly assigned in a 1:1:1:1:1 ratio to one of five study groups: the 3M-6D education GPT group, the GPT group, the 3M-6D education Gemini group, the Gemini group, or the control group. Participants in the 3M-6D education GPT and 3M-6D education Gemini groups will receive approximately three minutes of education before using ChatGPT or Gemini.Each participant will be randomly assigned one of 10 standardized clinical scenarios and complete a simulated counseling task in unrestricted natural language within approximately 10 minutes. The study will assess relevant condition identification, disposition concordance, red-flag identification, and NASA-TLX score.

Study Type

Interventional

Enrollment (Estimated)

525

Phase

Not Applicable

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Contact

Name: Xunming Ji
Phone Number: 01083198962
Email: jixm@ccmu.edu.cn

Study Contact Backup

Name: Chuanjie Wu
Phone Number: 01083199439
Email: wuchuanjie@ccmu.edu.cn

Study Locations

China
- Beijing Municipality
  - Beijing, Beijing Municipality, China, 100053
    - Xuanwu Hospital, Capital Medical University
    - Contact:
      
      Chuanjie Wu
      
      Phone Number: 010-83199439
      
      Email: wuchuanjie@ccmu.edu.cn

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

Adult
Older Adult

Accepts Healthy Volunteers

Yes

Description

Inclusion Criteria:

Age 18 years or greater, male or female;
Completed primary school or higher education;
Able to use a smartphone or computer to complete online interaction;
No history of acute ischemic stroke, systemic lupus erythematosus, gastric ulcer, pneumonia, acute cardiac infarction, urinary tract infection, uterine fibroids, diabetes, osteoarthritis, or migraine.
Able to understand and comply with study procedures and to provide written informed consent.

Exclusion Criteria:

Currently or previously employed as a healthcare worker;
Previously received systematic medical training;
Currently involved in concurrent research that may interfere with the results of the present trial;
The investigator considered that the participant had other conditions that might affect compliance or preclude participation.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Primary Purpose: Health Services Research
Allocation: Randomized
Interventional Model: Parallel Assignment
Masking: Single

Arms and Interventions

Participant Group / Arm Participant Group / Arm A group or subgroup of participants in a clinical trial that receives a specific intervention/treatment, or no intervention, according to the trial's protocol.	Intervention / Treatment Intervention / Treatment A process or action that is the focus of a clinical study. Interventions include drugs, medical devices, procedures, vaccines, and other products that are either investigational or already available. Interventions can also include noninvasive approaches, such as education or modifying diet and exercise.
Experimental: 3M-6D education GPT Group Participants will first be trained in 3M-6D education, then use ChatGPT to complete a consultation task in unrestricted natural language in approximately 10 minutes.	Behavioral: three minutes six dimensions education 3M-6D education is designed based on Cognitive Load Theory to reduce the cognitive burden on patients during medical interactions with AI and to improve the clarity and completeness of symptom reporting. Guided by cognitive load theory and the natural process physicians use to take medical histories, we identified candidate information dimensions and developed a structured expression framework with six dimensions for public health queries through a Delphi expert consensus process. Participants were instructed to use the framework to describe their symptoms across these six dimensions; this process can typically be completed within three minutes, so we call this approach three minutes six dimensions education (3M-6D education). Other: ChatGPT Participants use ChatGPT to complete a standardized simulated clinical scenarios in unrestricted natural language.
Experimental: 3M-6D education Gemini Group Participants will first be trained in 3M-6D education, then use Gemini to complete a consultation task in unrestricted natural language in approximately 10 minutes.	Behavioral: three minutes six dimensions education 3M-6D education is designed based on Cognitive Load Theory to reduce the cognitive burden on patients during medical interactions with AI and to improve the clarity and completeness of symptom reporting. Guided by cognitive load theory and the natural process physicians use to take medical histories, we identified candidate information dimensions and developed a structured expression framework with six dimensions for public health queries through a Delphi expert consensus process. Participants were instructed to use the framework to describe their symptoms across these six dimensions; this process can typically be completed within three minutes, so we call this approach three minutes six dimensions education (3M-6D education). Other: Gemini Participants use Gemini to complete a standardized simulated clinical scenarios in unrestricted natural language.
Active Comparator: GPT Group Participants will use ChatGPT to complete a consultation task in unrestricted natural language in approximately 10 minutes.	Other: ChatGPT Participants use ChatGPT to complete a standardized simulated clinical scenarios in unrestricted natural language.
Active Comparator: Gemini Group Participants will use Gemini to complete a consultation task in unrestricted natural language in approximately 10 minutes.	Other: Gemini Participants use Gemini to complete a standardized simulated clinical scenarios in unrestricted natural language.
No Intervention: Control group Participants will use non-AI tools such as internet searches and medical websites to complete a consultation task in unrestricted natural language in approximately 10 minutes.

What is the study measuring?

Primary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Relevant conditions identification of the 3M-6D education GPT group compared with the GPT group Time Frame: Usually within 1 hour.	Relevant conditions identification is defined as the proportion of participants whose final response includes the expert-defined final diagnosis or a relevant differential diagnosis.	Usually within 1 hour.
Disposition concordance of the 3M-6D education GPT group compared with the GPT group Time Frame: Usually within 1 hour.	Disposition concordance is defined as the proportion of participants whose final care recommendation matches the expert-defined level. The five levels are self-care, routine outpatient care, urgent outpatient care, emergency department visit, and emergency medical services.	Usually within 1 hour.
Relevant conditions identification of the 3M-6D education Gemini group compared with the Gemini group Time Frame: Usually within 1 hour.		Usually within 1 hour.
Disposition concordance of the 3M-6D education Gemini group compared with the Gemini group Time Frame: Usually within 1 hour.		Usually within 1 hour.

Secondary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Relevant conditions identification of the 3M-6D education GPT group compared with the control group Time Frame: Usually within 1 hour.		Usually within 1 hour.
Relevant conditions identification of the 3M-6D education Gemini group compared with the control group Time Frame: Usually within 1 hour.		Usually within 1 hour.
Disposition concordance of the 3M-6D education GPT group compared with the control group Time Frame: Usually within 1 hour.		Usually within 1 hour.
Disposition concordance of the 3M-6D education Gemini group compared with the control group Time Frame: Usually within 1 hour.		Usually within 1 hour.
Red-flag identification in the 3M-6D education GPT group compared with the GPT group Time Frame: Usually within 1 hour.	Red-flag identification is defined as the proportion of participants whose final response includes the key warning signs that experts defined for the assigned scenario.	Usually within 1 hour.
Red-flag identification in the 3M-6D education GPT group compared with the control group Time Frame: Usually within 1 hour.		Usually within 1 hour.
Red-flag identification in the 3M-6D education Gemini group compared with the Gemini group Time Frame: Usually within 1 hour.		Usually within 1 hour.
Red-flag identification in the 3M-6D education Gemini group compared with the control group Time Frame: Usually within 1 hour.		Usually within 1 hour.
NASA Task Load Index score of the 3M-6D education GPT group compared with the GPT group Time Frame: Usually within 1 hour.	NASA-TLX score is a self-reported task-load score measured after the simulated consultation with a physician. It includes six domains: mental demand, physical demand, temporal demand, effort, frustration, and performance. Each domain is scored from 0 to 100. The total score is the mean of the six domains. Higher scores indicate greater perceived task load.	Usually within 1 hour.
NASA Task Load Index score of the 3M-6D education GPT group compared with the control group Time Frame: Usually within 1 hour.		Usually within 1 hour.
NASA Task Load Index score of the 3M-6D education Gemini group compared with the Gemini group Time Frame: Usually within 1 hour.		Usually within 1 hour.
NASA Task Load Index score of the 3M-6D education Gemini group compared with the control group Time Frame: Usually within 1 hour.		Usually within 1 hour.
Relevant conditions identification of the 3M-6D education GPT group compared with the 3M-6D education Gemini group Time Frame: Usually within 1 hour.		Usually within 1 hour.
Disposition concordance of the 3M-6D education GPT group compared with the 3M-6D education Gemini group Time Frame: Usually within 1 hour.		Usually within 1 hour.
Red-flag identification in the 3M-6D education GPT group compared with the 3M-6D education Gemini group Time Frame: Usually within 1 hour.		Usually within 1 hour.
NASA Task Load Index score of the 3M-6D education GPT group compared with the 3M-6D education Gemini group Time Frame: Usually within 1 hour.		Usually within 1 hour.

Other Outcome Measures

Outcome Measure	Measure Description	Time Frame
Failure to identify red flags in the 3M-6D education GPT group compared with the GPT group Time Frame: Usually within 1 hour.	Failure to identify red flags is defined as the proportion of participants whose final response does not include the expert-defined red-flag symptoms or warning signs for the assigned standardized simulated clinical scenario.	Usually within 1 hour.
Failure to identify red flags in the 3M-6D education GPT group compared with the control group Time Frame: Usually within 1 hour.		Usually within 1 hour.
Failure to identify red flags in the 3M-6D education Gemini group compared with the Gemini group Time Frame: Usually within 1 hour.		Usually within 1 hour.
Failure to identify red flags in the 3M-6D education Gemini group compared with the control group Time Frame: Usually within 1 hour.		Usually within 1 hour.
Underestimation of disposition in the 3M-6D education GPT group compared with the GPT group Time Frame: Usually within 1 hour.	Underestimation of disposition is defined as the proportion of participants whose final care recommendation is lower than the expert-defined disposition level for the assigned standardized simulated clinical scenario.	Usually within 1 hour.
Underestimation of disposition in the 3M-6D education GPT group compared with the control group Time Frame: Usually within 1 hour.		Usually within 1 hour.
Underestimation of disposition in the 3M-6D education Gemini group compared with the Gemini group Time Frame: Usually within 1 hour.		Usually within 1 hour.
Underestimation of disposition in the 3M-6D education Gemini group compared with the control group Time Frame: Usually within 1 hour.		Usually within 1 hour.

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Sponsor

Capital Medical University

Collaborators

Xuanwu Hospital, Beijing

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Estimated)

June 20, 2026

Primary Completion (Estimated)

July 20, 2026

Study Completion (Estimated)

July 20, 2026

Study Registration Dates

First Submitted

June 11, 2026

First Submitted That Met QC Criteria

June 11, 2026

First Posted (Actual)

June 16, 2026

Study Record Updates

Last Update Posted (Actual)

June 16, 2026

Last Update Submitted That Met QC Criteria

June 11, 2026

Last Verified

June 1, 2026

More Information

Terms related to this study

Keywords

Other Study ID Numbers

LAMP-1

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

UNDECIDED

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Relevant Conditions Identification

NCT02271763

Completed

Comparing a Horizontal Ultrasound Technique With a Palpation Technique for Localising the Cricothyroid Membrane (CTM)

Airway Identification
NCT02043327

Completed

Simulation Based Training in Colonoscopy

Group Identification
NCT07158983

Not yet recruiting

Evaluation of a Child Health Care Program for Early Identification of Family Violence

Identification of Family Violence
NCT07158372

Recruiting

Research on Identifying Critical Surgical Anatomy in Cholecystectomy Videos Based on Deep Learning

Cholecystectomy | Surgical Video Identification
NCT04416074

Completed

An Online 5-week Professional Identity Group Psychotherapy

Identification (Psychology)
NCT07583199

Recruiting

The Effect of Student-Midwife Support Circle Intervention on Burnout, Resilience, and Professional Belonging Levels of Midwifery Students

Burnout | Resilience | Professional Identification
NCT04466449

Recruiting

Identification of Entamoeba Spp. in Assuit Governorate Hospitals.

Entamoeba Spp. Identification
NCT05038423

Completed

The 5R Shared Leadership Program in Older Adult Walking Groups

Leadership | Empowerment | Identification
NCT06701864

Completed

Palpation Versus Ultrasonography for Identifying the Cricothyroid Membrane of Distorted Neck-anatomy (DEVITRACH3)

Airway Management | Identification of the Cricothyroid Membrane
NCT07325786

Active, not recruiting

Percutaneous Wound Sampling With Analysis in Blood Culture (PERKA-B) Method

Identification | Wound Infection Bacterial

Clinical Trials on three minutes six dimensions education

NCT05256082

Completed

Impact of Face Masks on 6MWD in Patients With Pulmonary Hypertension

Pulmonary Hypertension
NCT06087172

Completed

Validity of 6 Minutes Stepper Test in Hypertension

Hypertension | Arterial Hypertension
NCT06206395

Not yet recruiting

Role of Multi-modality Imaging in the Assessment of Chemotherapy Related Cardiac Dysfunction Among Cancer Patients

Breast Cancer | Cardiac Magnetic Resonance | 2D Speckle Tracking Echocardiography
NCT05931016

Completed

Impact of Surgical Mask, FFP2 Mask and FFP3 Mask (With and Without Exhalation Valve) on Exercise Tolerance and Blood Gas Parameters of Patients With Known Lung Disease and Long-term Oxygen Therapy (FFP-O2)

Face Mask | Longterm Oxygen Therapy
NCT02631434

Completed

Comparison Between Sit-to-stand Test and Six-minute Walk Test in Chronic Obstructive Pulmonary Disease

Chronic Obstructive Pulmonary Disease
NCT04644991

Completed

The Effect of Kinesiology Taping on Lumbar Region Structures and Balance in Transfemoral Amputees

Transfemoral Amputees | Kinesiology Taping
NCT01505972

Completed

Time Schedules for Sending Invitations to Colonoscopy Screening

Colorectal Cancer
NCT05873868

Active, not recruiting

Myocardial Effects in Patients With ATTRv With Polyneuropathy Treated With Patisiran or Vutrisiran (MyocardON-TTR)

Transthyretin Amyloidosis | Amyloidosis, Hereditary
NCT00431470

Completed

Are Character Building Lessons Effective in Decreasing Bullying Behaviors?

Aggression
NCT02059317

Completed

Evaluation of Dynamic Stability in the Low Back Pain Patient (SDL)

Low Back Pain | Chronic Low Back Pain

Improving the Reliability of LLMs as Medical Assistants for the General Public (LAMP-1)

Improving the Reliability of LLMs as Medical Assistants for the General Public: a Proof of Concept Simulation Trial

Study Overview

Status

Conditions

Intervention / Treatment

Detailed Description

Study Type

Enrollment (Estimated)

Phase

Contacts and Locations

Study Contact

Study Contact Backup

Study Locations

Participation Criteria

Eligibility Criteria

Ages Eligible for Study

Accepts Healthy Volunteers

Description

Study Plan

How is the study designed?

Design Details

Number of Arms

Arms and Interventions

Participant Group / Arm

Intervention / Treatment

What is the study measuring?

Primary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Secondary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Other Outcome Measures

Outcome Measure

Measure Description

Time Frame

Collaborators and Investigators

Sponsor

Collaborators

Study record dates

Study Major Dates

Study Start (Estimated)

Primary Completion (Estimated)

Study Completion (Estimated)

Study Registration Dates

First Submitted

First Submitted That Met QC Criteria

First Posted (Actual)

Study Record Updates

Last Update Posted (Actual)

Last Update Submitted That Met QC Criteria

Last Verified

More Information

Terms related to this study

Keywords

Other Study ID Numbers

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

Clinical Trials on Relevant Conditions Identification

Clinical Trials on three minutes six dimensions education

Search Similar Trials

Sponsors and Collaborators

Medical Conditions

Drug Interventions

Conditions

Rare Diseases

Drug Interventions

Dietary Supplements

Sponsor/Collaborators

Locations