- ICH GCP
- US Clinical Trials Registry
- Clinical Trial NCT06457269
Evaluating the Potential of Large Language Models for Respiratory Disease Consultations (EPLLMMRDC)
Evaluating the Potential of Large Language Models for Respiratory Disease Consultations: A Randomized Crossover Trial
The clinical trial aimes to evaluate multiple large language models in respiratory disease consultations by comparing their performance to that of human doctors across three major medical consultation scenarios.
The main question aims to answer are:
- How do large language models perform in comparison to human doctors in diagnosing and consulting on respiratory diseases across various clinical scenarios?
In three clinical scenarios including the online query section, the disease diagnosis section and the medical explanation section, research assistants or volunteers will be asked to cross-question all LLMs or real doctors using predefined online questions and their own issues. After each questioning session, a short washout period is implemented to eliminate potential biases.
Study Overview
Status
Conditions
Intervention / Treatment
- Diagnostic test: Diagnosis by three human doctors
- Diagnostic test: Diagnosis by ChatGPT-3.5 (with search capabilities)
- Diagnostic test: Diagnosis by ChatGPT-3.5 (without search capabilities)
- Diagnostic test: Diagnosis by ChatGPT-4.0 (with search capabilities)
- Diagnostic test: Diagnosis by ChatGPT-4.0 (without search capabilities)
- Diagnostic test: Diagnosis by Claude instant (with search capabilities)
- Diagnostic test: Diagnosis by Claude instant (without search capabilities)
- Diagnostic test: Diagnosis by Claude 2 (with search capabilities)
- Diagnostic test: Diagnosis by Claude 2 (without search capabilities)
- Diagnostic test: Diagnosis by Gemini Pro (with search capabilities)
- Diagnostic test: Diagnosis by Gemini Pro (without search capabilities)
Study Type
Enrollment (Actual)
Phase
- Not Applicable
Contacts and Locations
Study Locations
-
-
Sichuan
-
Nanchong, Sichuan, China, 637000
- The Affiliated Hospital of North Sichuan Medical College
-
-
Participation Criteria
Eligibility Criteria
Ages Eligible for Study
- Child
- Adult
- Older Adult
Accepts Healthy Volunteers
Description
Inclusion Criteria:
- Self-reported symptoms of common respiratory diseases, such as cough, chest tightness, fever, and wheezing
- Ability to engage in LLM dialog operations independently or with minimal peer training
- A health status deemed suitable for study participation by the pulmonology experts
Exclusion Criteria:
1) Excessively poor health status
Study Plan
How is the study designed?
Design Details
- Primary Purpose: Diagnostic
- Allocation: Randomized
- Interventional Model: Crossover Assignment
- Masking: Quadruple
Arms and Interventions
Participant Group / Arm |
Intervention / Treatment |
|---|---|
|
Other: Cross-comparison group(the disease diagnosis section)
Cross-comparison group (including human doctor controls and all LLMs)
|
This intervention involves answering patient inquiries by different human doctors.
Each patient is randomly assigned by the system to three doctors from different provinces in China selected from the database of doctors.
The doctors all come from multiple online consultation platforms in China, and their diagnostic qualifications and medical licenses have undergone strict verification.
This intervention involves answering patient inquiries by ChatGPT-3.5 with search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by ChatGPT-3.5 without search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by ChatGPT-4.0
with search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by ChatGPT-4.0
without search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by Claude instant with search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by Claude instant without search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by Claude 2 with search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by Claude 2 without search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by Gemini Pro with search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by Gemini Pro without search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
|
|
Other: Cross-comparison group(the medical explanation section)
Cross-comparison group (including human doctor controls and all LLMs)
|
This intervention involves answering patient inquiries by different human doctors.
Each patient is randomly assigned by the system to three doctors from different provinces in China selected from the database of doctors.
The doctors all come from multiple online consultation platforms in China, and their diagnostic qualifications and medical licenses have undergone strict verification.
This intervention involves answering patient inquiries by ChatGPT-3.5 with search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by ChatGPT-3.5 without search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by ChatGPT-4.0
with search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by ChatGPT-4.0
without search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by Claude instant with search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by Claude instant without search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by Claude 2 with search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by Claude 2 without search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by Gemini Pro with search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
This intervention involves answering patient inquiries by Gemini Pro without search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.
|
What is the study measuring?
Primary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Expert indicators-Accuracy
Time Frame: For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. As for subjective expert indicators, the evaluation will be conducted within two months.
|
Based on the doctors' responses to patients' issues, a 5-point scale will be used for scoring by an expert panel: 5- The responses are completely accurate, addressing all of the patient's questions or diagnosing by identifying the key points of the patient's complaint.
4- The responses are mostly accurate, generally addressing the patient's questions or diagnosing by identifying the key points of the patient's complaint.
3- The responses are moderately accurate, addressing the patient's questions or diagnosing by identifying the key points of the patient's complaint.
2- The responses are rarely accurate, barely addressing the patient's questions or diagnosing by identifying the key points of the patient's complaint.
1- The responses are very inaccurate, not addressing the patient's questions or diagnosing by identifying the key points of the patient's complaint at all.
|
For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. As for subjective expert indicators, the evaluation will be conducted within two months.
|
|
Expert indicators-Comprehensiveness
Time Frame: For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. As for subjective expert indicators, the evaluation will be conducted within two months.
|
Based on the doctors' responses to patients' issues, a 5-point scale will be used for scoring by an expert panel: 5-The responses are highly comprehensive, addressing various aspects of potential diseases corresponding to the patient's symptoms, providing detailed advice, and offering its own extended interpretations.
4-The responses are mostly comprehensive, covering most aspects of potential common diseases related to the patient's symptoms, and providing fairly detailed advice.
3-The responses are moderately comprehensive, addressing some aspects of potential common diseases related to the patient's symptoms, and offering basic advice.
2-The responses are rarely comprehensive, failing to consider various aspects of potential common diseases related to the patient's symptoms, and providing very limited advice.
1-The responses are not comprehensive at all, overlooking most potential diseases related to the patient's symptoms, and failing to provide any advice.
|
For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. As for subjective expert indicators, the evaluation will be conducted within two months.
|
|
Expert indicators-Correctness
Time Frame: For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. As for subjective expert indicators, the evaluation will be conducted within two months.
|
Based on the doctors' responses to patients' issues, a 5-point scale will be used for scoring by an expert panel: 5- The responses are completely correct, with no inappropriate or ambiguous statements.
4- The responses are mostly correct, with most statements being appropriate and unambiguous.
3- The responses are generally correct, although there are inappropriate or ambiguous statements, they are acceptable.
2- The responses are partially correct, with few statements being appropriate or unambiguous.
1- The responses are completely incorrect, with nearly all statements being inappropriate and full of ambiguities.
|
For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. As for subjective expert indicators, the evaluation will be conducted within two months.
|
|
Expert indicators-Ethical compliance
Time Frame: For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. As for subjective expert indicators, the evaluation will be conducted within two months.
|
Based on the doctor's response to the patient's question, an expert panel will review each item in accordance with the Declaration of Helsinki and the International Code of Medical Ethics which aims to determine whether there are any responses or suggestions that could potentially harm the patient or violate ethical guidelines.
The findings will be recorded using binary variables: True-The responses are completely ethical.
False-When uncertainties exist, the response includes suggestions for the use of controlled medications and some inappropriate or even counterproductive advice.
|
For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. As for subjective expert indicators, the evaluation will be conducted within two months.
|
|
Empathy indicators
Time Frame: For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. As for subjective empathy indicators, the evaluation will be conducted within two months.
|
Results from CARE scales concerning the doctor-patient relationship, which were completed by patients following each diagnostic session.
Specifically, the online query section does not apply the evaluation of CARE scales.
|
For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. As for subjective empathy indicators, the evaluation will be conducted within two months.
|
Secondary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Regular indicators-Total number of questions
Time Frame: For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.
|
The number of follow-up questions asked by the LLM or real doctor to the patient after providing basic answers in a complete conversation.
|
For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.
|
|
Regular indicators-Follow-up words
Time Frame: For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.
|
The number of words in follow-up questions asked by the LLM or real doctor to the patient after providing basic answers in a complete conversation.
|
For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.
|
|
Regular indicators-Total number of conversations
Time Frame: For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.
|
The total number of dialogs in a complete conversation between a user and LLMs or a real doctor, where each dialog consists of one question and one answer.
|
For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.
|
|
Regular indicators-Total conversation cost ($)
Time Frame: For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.
|
The total cost in dollars for completing the entire conversation.
|
For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.
|
|
Regular indicators-Total conversation time (min)
Time Frame: For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.
|
Timing starts from the user's input and stops when the LLMs or real doctors completes the output of the last sentence.
|
For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.
|
|
Regular indicators-Number of output statements
Time Frame: For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.
|
The total number of words output by the LLMs or real doctors.
|
For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.
|
|
Regular indicators-Number of input statements
Time Frame: For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.
|
The sum of the number of characters entered by the user.
|
For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.
|
Collaborators and Investigators
Sponsor
Collaborators
Investigators
- Principal Investigator: Jiebin Xie, Doctor, North Sichuan Medical College
Publications and helpful links
Study record dates
Study Major Dates
Study Start (Actual)
Primary Completion (Actual)
Study Completion (Actual)
Study Registration Dates
First Submitted
First Submitted That Met QC Criteria
First Posted (Actual)
Study Record Updates
Last Update Posted (Estimated)
Last Update Submitted That Met QC Criteria
Last Verified
More Information
Terms related to this study
Additional Relevant MeSH Terms
- Vascular Diseases
- Cardiovascular Diseases
- Pathologic Processes
- Infections
- Lung Diseases
- Bronchial Diseases
- Lung Diseases, Obstructive
- Embolism and Thrombosis
- Gram-Positive Bacterial Infections
- Bacterial Infections
- Bacterial Infections and Mycoses
- Lung Diseases, Interstitial
- Fibrosis
- Actinomycetales Infections
- Mycobacterium Infections
- Pulmonary Embolism
- Pulmonary Fibrosis
- Embolism
- Respiratory Tract Infections
- Respiratory Tract Diseases
- Tuberculosis
- Respiration Disorders
- Bronchiectasis
- Bronchitis
Other Study ID Numbers
- 1426887-2024-1
- 22XQT0309 (Other Grant/Funding Number: the cooperation of urban schools in Nanchong City)
- CBY22-QDA15 (Other Grant/Funding Number: the doctoral startup fund of North Sichuan Medical College)
- 2022LC005 (Other Grant/Funding Number: the affiliated hospital of North Sichuan Medical College)
- 23JCYJPT0014 (Other Grant/Funding Number: the scientific research project of the science and technology bureau of Nanchong)
Drug and device information, study documents
Studies a U.S. FDA-regulated drug product
Studies a U.S. FDA-regulated device product
This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.
Clinical Trials on Pneumonia
-
King Edward Memorial HospitalCompletedNosocomial Pneumonia | Healthcare-Associated Pneumonia | Aspiration Pneumonia | Ventilator-Associated PneumoniaIndia
-
Melinta Therapeutics, Inc.WithdrawnHospital-Acquired Bacterial Pneumonia | Ventilator-Associated Bacterial Pneumonia | Hospital-Acquired Pneumonia | Ventilator-Associated Pneumonia
-
Venatorx Pharmaceuticals, Inc.Biomedical Advanced Research and Development AuthorityWithdrawnHospital-acquired Pneumonia | Ventilator-associated Pneumonia
-
Universidad de la SabanaClínica Universidad de La Sabana; Universidad de La Sabana, ColombiaCompletedPneumococcal Pneumonia | Community Acquired Pneumonia (CAP)Colombia
-
PfizerCompletedVentilator-associated Pneumonia (VAP) | Nosocomial Pneumonia (NP)Bulgaria, France, Italy, Korea, Republic of, Mexico, Peru, Poland, Russian Federation, Spain, Turkey, United Kingdom, Vietnam, Philippines, China, Ukraine, Argentina, Brazil, Hungary, Romania, India, Japan, Taiwan, Latvia, Czechia, Slov... and more
-
Arpida AGTerminatedHospital-Acquired Pneumonia | Ventilator-Associated Pneumonia | Health-Care-Associated Pneumonia
-
Hannover Medical SchoolCharite University, Berlin, Germany; University of LeipzigUnknownCOVID-19 | Bacterial Pneumonia | Viral Pneumonia | Pneumonia Due to Streptococcus Pneumoniae | Pneumonia Due to H. Influenzae | Pneumonia, Organism Unspecified | Pneumonia in Diseases Classified Elsewhere | Pneumonia Due to Other Specified Infectious OrganismsGermany
-
Nantes University HospitalSociété Française d'Anesthésie et de RéanimationCompletedPneumonia | Sepsis | Ventilator-Associated Pneumonia | Hospital Acquired PneumoniaFrance
-
Hu YinanEnrolling by invitationSialic Acid | Superoxide Dismutase | Lipid PneumoniaChina
-
Chia Tai Tianqing Pharmaceutical Group Co., Ltd.Not yet recruitingHospital-acquired Bacterial Pneumonia/Ventilator-associated Bacterial PneumoniaChina
Clinical Trials on Diagnosis by three human doctors
-
Guangzhou Women and Children's Medical CenterUnknownPediatric Outpatients Encountered in Three Specialty Clinics, i.e. Respirology, Gastroenterology, and Genito-urologyChina
-
Wuhan Union Hospital, ChinaWuhan University; Renmin Hospital of Wuhan University; Wuhan TongJi HospitalRecruiting
-
Changhai HospitalUnknownProstatic Neoplasms | Benign Prostatic Hyperplasia | Digital Rectal ExaminationChina
-
Chinese PLA General HospitalNot yet recruitingFracture of Neck of FemurChina
-
University of Colorado, DenverChildren's Hospital Colorado; Bright by 3; Daniels Fund; Piton Foundation; Gantz...Terminated
-
Erzurum Technical UniversityNot yet recruitingOffice WorkersTurkey (Türkiye)
-
Centre Hospitalier Universitaire de Saint EtienneCompletedPigmented Lesions | Genital TumorFrance
-
M.D. Anderson Cancer CenterCompletedLung CarcinomaUnited States
-
University of Roma La SapienzaRecruiting
-
Regionsenter for barn og unges psykiske helseNorwegian University of Science and Technology; The Research Council of Norway and other collaboratorsCompletedChild Development | Childcare QualityNorway