Evaluation of AI Large Models for Diagnosis and Treatment in Real-World Cases: Multicenter Retrospective Study

This multicenter retrospective study aims to evaluate the diagnostic and therapeutic performance of three large language models-ChatGPT, Gemini and Deepseek-using 800 archived inpatient medical records from urology departments across four tertiary hospitals. The study will focus on the accuracy and applicability of these models in disease recognition, preliminary diagnosis and treatment recommendation generation, in order to explore their potential value and limitations in supporting clinical decision-making in real-world settings.

Study Overview

Study Type

Observational

Enrollment (Estimated)

800

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Contact

Study Locations

      • Fuzhou, China
        • Recruiting
        • The First Affiliated Hospital of Fujian Medical University

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult
  • Older Adult

Accepts Healthy Volunteers

No

Sampling Method

Non-Probability Sample

Study Population

The study population was drawn from the following institutions: The First Affiliated Hospital of Fujian Medical University, The Second Affiliated Hospital of Fujian Medical University,Shishi City Hospital and Shaowu City Hospital

Description

Inclusion Criteria:

  • The case data is sourced from the four hospitals involved in the study, with complete and authentic diagnosis and treatment records.
  • Patients must be 18 years or older, with no gender restrictions.
  • Complete medical records, including the following core information: patient' s basic information, present illness history, past medical history, physical examination, and auxiliary examinations (including laboratory and imaging tests).
  • A clear discharge diagnosis and treatment plan (including therapeutic measures and follow-up arrangements).
  • Medical records have been archived, with objective and accurate information that has not been altered.
  • The patient or their legal representative has provided informed consent, agreeing to the use of their anonymized medical data for research analysis.

Exclusion Criteria:

  • Medical records with significant missing information, such as key clinical details (present illness history, diagnostic or treatment records, etc.).
  • Cases where the diagnosis or treatment plan is unclear, or where treatment has not been fully completed for an initial diagnosis.
  • Cases where the primary diagnosis is not urological.
  • Cases with major errors or inconsistencies in the records that could affect further assessment.
  • Medical records in special formats or images that are not readable (e.g., handwritten notes, non-standard documentation).
  • Patients who have not signed the informed consent form or who refuse to allow their medical data to be used for research.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Diagnostic Accuracy: Assessed by Top-1 accuracy
Time Frame: Through study completion, an average of 3 months
Top-1: Proportion of cases where the model's first diagnosis matches the true primary diagnosis.
Through study completion, an average of 3 months
Diagnostic Accuracy: Assessed by Top-3 accuracy
Time Frame: Through study completion, an average of 3 months
Top-3: Proportion of cases where the true diagnosis appears in the model's top 3.
Through study completion, an average of 3 months
Diagnostic Completeness
Time Frame: Through study completion, an average of 3 months
Proportion of the model's diagnoses that overlap with all diagnoses (primary and secondary) in the case.
Through study completion, an average of 3 months
Differential Diagnosis Quality
Time Frame: Through study completion, an average of 3 months
Evaluated by experts using a Likert 5-point scale, considering factors like common disease coverage, logical clarity, and specificity
Through study completion, an average of 3 months
Treatment Plan Quality
Time Frame: Through study completion, an average of 3 months
Assesses whether the model's treatment suggestions align with clinical guidelines, scored by experts on completeness, appropriateness, and safety.
Through study completion, an average of 3 months
Analysis Time
Time Frame: Through study completion, an average of 3 months
5.Time taken by the AI model to provide diagnoses and treatment suggestions (in seconds), reflecting real-time capability.
Through study completion, an average of 3 months

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Estimated)

January 1, 2026

Primary Completion (Estimated)

April 1, 2026

Study Completion (Estimated)

June 1, 2026

Study Registration Dates

First Submitted

December 9, 2025

First Submitted That Met QC Criteria

January 26, 2026

First Posted (Actual)

January 30, 2026

Study Record Updates

Last Update Posted (Actual)

January 30, 2026

Last Update Submitted That Met QC Criteria

January 26, 2026

Last Verified

January 1, 2026

More Information

Terms related to this study

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

NO

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Urologic Diseases

Clinical Trials on Large Language Model Assessment (ChatGPT, Gemini, DeepSeek)

Subscribe