- ICH GCP
- US Clinical Trials Registry
- Clinical Trial NCT07199231
- Original Trial
OpenEvidence Safety and Comparative Efficacy of Four LLM's in Clinical Practice
A Comparative Performance Evaluation of Four Publicly Available Large Language Models Against Gold Standard Medical References
OpenEvidence is an online tool that aggregates and synthesizes data from peer-reviewed medical studies, then producing a response to a user's questions using generative AI. While it is in use by a number of clinicians (including residents) today, there is little to no published data on whether the tool's outputs are accurate and whether this information appropriately informs clinical decision making. Similarly, a number of clinicians are turning to other large language models (LLM's) to assist in decision making when providing clinical care. While there have been a number of studies published on the accuracy of these LLM's responses to medical boards questions or clinical vignettes, there have been few studies to date examining their performance in a real world clinical setting, and even fewer comparing this performance.
In this study, investigators have two goals:
- To determine whether the use of the AI tool "OpenEvidence" leads to clinically appropriate decisions when utilized by family medicine, internal medicine, and psychiatry residents in the course of clinical practice.
- To determine how the output of the OpenEvidence tool compares with three other commonly-used, publicly-available large language models (OpenAI's ChatGPT, Anthropic's Claude, and Google's Gemini) in answering common questions that residents have in the course of clinical practice.
To accomplish study goal #1, investigators have enlisted residents in the above specialties to use the OpenEvidence tool in the course of clinical practice. In order to mitigate any safety risks, the residents will also use a typical reference tool for their question, which is referred to as the "Gold Standard" tool. These tools include PubMed and UpToDate. The residents will:
- State their clinical question.
- Query OpenEvidence, capturing their prompt and the OpenEvidence output for data analysis. All residents will undergo training in prompt engineering at the start of the study.
- State their clinical conclusion based on the OpenEvidence data.
- Query the Gold Standard Resource.
- State their final clinical conclusion.
- Answer a question on whether their clinical conclusion was modified by the Gold Standard reference.
- Answer a question on whether they had any clinical safety concerns on the output from OpenEvidence.
Attending physician Subject Matter Experts (SMEs) matched by specialty with at least 5 years of post-training clinical experience will then evaluate the residents' responses. 5 years was chosen based the book "Outliers" by Malcolm Gladwell, in which he asserts that 10,000 hours of focused practice is needed to achieve expertise in a field.
SMEs will be asked to evaluate the residents' initial clinical questions and their conclusions based only on OpenEvidence. They will be asked to rate the clinical appropriateness of those conclusions on a scale of 1-10. For questions where the SME's rate the clinical appropriateness of the residents' conclusions poorly (< 5/10), they will be asked to review the OpenEvidence output and answer an additional question as to whether the output was incorrect or the resident misinterpreted the output from the tool.
To accomplish goal #2, the initial prompt entered by the residents into OpenEvidence will be copied by the research team into ChatGPT, Gemini, and Claude. The outputs from each tool (including OpenEvidence) will be surfaced to SMEs, who will be asked to rate each output based on accuracy, completeness, and bias. Likert scales will be used for these ratings. SMEs will also be asked an open-ended question to identify any patient safety issues from any of the outputs.
Study Overview
Status
Intervention / Treatment
Detailed Description
OpenEvidence is an online tool built out of the May Clinic Platform Accelerate [OpenEvidence] that aggregates and synthesizes data from peer-reviewed medical studies, then producing a response to a user's questions using generative AI. While it is in use by a number of clinicians (including residents) today, there is little to no published data on whether the tool's outputs are accurate and whether this information appropriately informs clinical decision making. Similarly, a number of clinicians are turning to other large language models (LLM's) to assist in decision making when providing clinical care.
OpenEvidence is an online tool designed to aggregate and synthesize data from peer-reviewed clinical studies, subsequently generating responses to user inquiries through the application of generative AI. Although increasingly utilized by both seasoned clinicians and trainees, there is a notable absence of published data regarding the accuracy of the tool's outputs, their safety and efficacy in appropriately informing clinical decision-making. Concurrently, a growing number of clinicians are leveraging other publicly-available large language models (LLMs) to support decision-making in clinical care. While a number of studies have examined the accuracy of LLM responses to medical board questions or clinical vignettes, there is limited research on their performance in real-world clinical settings, and even fewer studies offer comparative analyses of this performance.
In a review of the literature, one article shows LLM's may be better at detecting anxiety than practitioners, but this was based on clinical vignettes. [Levkovich et al.] Another looked at diagnostic sensitivity of LLM's using patient-reported outcome measures in a structured questionnaire. [Pagano et al.] An additional study comparing LLM's for oncology also uses fictional vignettes. [Benary et al.] A randomized control trial using clinical vignettes did not show any clinical improvement for providers who had access to LLM's. [Goh et al.] One case study explored integration of ChatGPT 3.5 into daily rounds and evaluated its use qualitatively, but did not compare it with other LLM's or gold standard reference tools. [Skryd et al.] Another compared ChatGPT's responses to American College of Radiology appropriateness criteria for breast pain and breast cancer screening, but again did not compare it with other LLM's. [Rao et al.] In our review, only one study evaluated LLM's in a real world clinical setting. This was a series of papers that looked at their use for complex decision making in breast-cancer care, using a small number of actual cases and a standardized prompt template. [Griewing, Knitza et al.; Griewing, Gremke et al.] That study found issues with consistency and deterioration of accuracy (particularly with GPT 3.5), leading the authors to conclude that the clinical use of LLM's for that purpose was not yet feasible at the time of publication. Still, health systems leaders see the use of these tools rapidly accelerating in clinical practice. For this reason, investigators believe it is imperative to study their safety and the clinical appropriateness of the decisions clinicians are making as a result of their use.
Cambridge Health Alliance (CHA) is a public, academic safety-net health system in the Boston area, serving a diverse population of patients. CHA has a robust primary care and outpatient psychiatry footprint, and supports a large graduate medical education program through both Harvard Medical School and Tufts University School of Medicine. Investigators chose residents as our primary study participants as many trainees are already using OpenEvidence, and found them more incentivized to participate in the study if given access to the tool at CHA (where it is otherwise blacklisted from network services and prohibited by policy until results of this study can be determined).
Study outcomes are as follows:
Determine whether the use of OpenEvidence leads to clinically appropriate decisions by residents in the course of clinical practice in a community health setting.
Determine how the output of OpenEvidence compares with three other commonly-used, publicly-available large language models (OpenAI's ChatGPT, Anthropic's Claude, and Google's Gemini) in accuracy, completeness, and bias when addressing clinical questions residents have in the course of clinical practice in a community health setting.
Methods:
Data collection is planned to take place over a 6-month period in order to minimize vendor version upgrades during the study period. Residents are grouped by specialty into "medicine" (internal medicine/family medicine) and "psychiatry" (adult/child psychiatry). In order to simplify matching to appropriate specialty subject matter experts, medicine residents are asked to use OpenEvidence only for adult primary care cases (excluding OB/GYN-related issues). Psychiatry residents are asked to use OpenEvidence only for adult psychiatry cases.
Before being accepted as participants, trainees were all asked to agree to the following:
- Cross check any OpenEvidence query against a Gold Standard tool, defined in the study protocol to include PubMed, UpToDate, Dynamed, a clinical specialty society guideline, or other similar clinical reference source that must be documented in the study form.
- Not enter any personal health information (PHI) into OpenEvidence (as defined below).
- Attempt to use OpenEvidence at least 3 times per week, if appropriate to the clinical care of the patient.
- Document 100% of their OpenEvidence queries into the study research form to avoid selection bias.
All residents will be given brief training in prompt engineering for healthcare before data collection begins. Standardized prompts will not be used, as one of the subgoals of the study is to understand what types of queries residents submit to OpenEvidence in a real world setting.
All residents will be educated on the definition of PHI, as follows:
Queries should not include any of PHI, as defined by the Safe Harbor identifiers [HHS]; queries can include patient age in years (days/weeks/months for pediatrics), and legal sex; for patients age 89 or older, the user must instead use the term "over age 89" to comply with Safe Harbor standards.
Queries should not include patients suspected of having extremely rare conditions as defined by the National Organization for Rare Disorders, as these are also prone to reidentification [NORD]. If a rare condition is not initially suspected but becomes suspected through the research process of using the AI tool, the user will be asked to stop their query at that point.
Data collection will involve the use of a HIPAA-compliant Google Form within CHA's enterprise Google Workspace for Health cloud infrastructure. The data collection form will ask trainees do the following:
- Enter their initial clinical question.
- Paste their OpenEvidence prompt (numbered sequentially for iterative prompts) and the full OpenEvidence output(s) generated.
- Enter their clinical conclusion based on the OpenEvidence output.
- Enter the Gold Standard reference tool used.
- State their final clinical conclusion based on information from both OpenEvidence and the Gold Standard Reference tool.
- Answer a question on the extent to which their initial clinical conclusion was modified by the Gold Standard reference.
- Answer an open-ended question on whether they noticed any clinical safety issues, inaccuracies, or bias in the output from OpenEvidence.
Queries will be sorted by specialty (medicine vs. psychiatry), and each query will receive a sequential study number.
Attending physician Subject Matter Experts (SMEs) Board Certified in Internal Medicine, Family Medicine, or Psychiatry with at least 5 years of post-training clinical experience were recruited. Five years of post-training clinical experience was chosen based on the fact that Malcolm Gladwell, in his book, Outliers, asserts that 10,000 hours of focused practice is needed to achieve expertise in a field.
SMEs will be asked to evaluate the residents' initial clinical questions and their conclusions based only on OpenEvidence. They will be asked to rate the clinical appropriateness of those conclusions on a 10-point Likert scale. SME's will also be provided with the OpenEvidence output for each query, and where the SME rates the clinical appropriateness of the residents' conclusions poorly (< 5/10), the SME will additionally be asked a follow-up question to assess whether the tool's output itself provided a clinically inappropriate response, in order to ascertain whether the trainee may have misinterpreted the tool's output. SME review will include a 2.5-5% overlap between reviewers to calculate a kappa score for interrater reliability.
In part two of the study, the research team will sort the OpenEvidence queries into themes, and choose a random sampling of queries from each specialty and theme for comparison between LLM's. The research team confirm that prompts do not include any PHI according to study protocol. They will then copy the OpenEvidence prompts entered by residents for the selected queries and paste them exactly into ChatGPT, Gemini, CoPilot and Claude.
The outputs of each of the five tools (OpenEvidence, ChatGPT, Gemini, Copilot, and Claude) will be surfaced in a Google webform. SMEs will be asked to rate each output on a Likert scale for accuracy, completeness, and bias, as well as to answer a qualitative question identifying any patient safety issues in the output.
Results:
Primary outcome results will be reported as follows:
Clinical appropriateness of decision made by residents using OpenEvidence (mean with SD, median), by specialty
- If, in cases of low clinical appropriateness, SME's identified that this was due not to the tool's output but instead due to the resident's interpretation of the tool, metrics will also be provided with these cases excluded
- Interrater reliability (kappa value)
Secondary outcome results will be reported as follows:
For each specialty and each variable (accuracy, completeness, and bias), investigators will report:
- The "win" rate for each LLM and average margin from the second place score.
- The effect size (using each LLM as a separate reference) using Cohen's d test.
- Interrater reliability (kappa value)
Study Type
Enrollment (Estimated)
Contacts and Locations
Study Locations
-
-
Massachusetts
-
Cambridge, Massachusetts, United States, 02193
- Cambridge Health Alliance
-
-
Participation Criteria
Eligibility Criteria
Ages Eligible for Study
- Child
- Adult
- Older Adult
Accepts Healthy Volunteers
Sampling Method
Study Population
Description
Inclusion Criteria:
- Active trainees PGY-1 through PGY-6 in Internal Medicine, Family Medicine, Adult Psychiatry, or Child Psychiatry at Cambridge Health Alliance
- Must agree to the study protocol requirements outlined in the study description.
Exclusion Criteria:
- Anyone who does not meet inclusion criteria
- Residents who plan to leave CHA prior to the end of the study collection period.
Study Plan
How is the study designed?
Design Details
Cohorts and Interventions
Group / Cohort |
Intervention / Treatment |
|---|---|
|
Medicine Residents
Trainees in internal medicine or family medicine
|
Residents will use OpenEvidence clinical reference tool in the course of routine clinical care.
They must also use a Gold Standard clinical reference tool (e.g.
PubMed, UpToDate) to mitigate risk.
|
|
Psychiatry residents
Trainees in adult and child psychiatry
|
Residents will use OpenEvidence clinical reference tool in the course of routine clinical care.
They must also use a Gold Standard clinical reference tool (e.g.
PubMed, UpToDate) to mitigate risk.
|
What is the study measuring?
Primary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Clinical Appropriateness: Mean with SD
Time Frame: 6 months
|
Clinical appropriateness score of resident decisions based on OpenEvidence output. This is numeric score on a 10-point Likert scale. (For Likert scales described in this and all of our outcome metrics, higher scores are better outcomes.) Mean score with standard deviation will be used for primary outcome. |
6 months
|
|
Clinical Appropriateness: Median
Time Frame: 6 months
|
Clinical appropriateness score of resident decisions based on OpenEvidence output. This is numeric score on a 10-point Likert scale. Median clinical appropriateness scores will also be reported. |
6 months
|
|
Clinical Appropriateness: Interrater Reliability
Time Frame: 6 months
|
SME's will evaluate Clinical Appropriateness scores of resident decisions based on OpenEvidence output on a 10-point Likert scale. Interrater reliability of SME Clinical Appropriateness scores will be calculated using kappa value. |
6 months
|
Secondary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Comparative Accuracy of LLM's: Win Rate
Time Frame: 6 months
|
Comparing outputs of OpenEvidence, ChatGPT, Gemini, and Claude, SME's will rate each tool for accuracy on a 10-point Likert scale. Whichever model wins will be given a score of "1". If there is a tie, each gets 0.5 or 0.33 points, depending on the division of the tie. For each specialty (medicine and psychiatry), and each LLM, we will report the "win rate" with average margin from the second place LLM. For example, for all Medicine queries, we will report the percentage of times OpenEvidence "won" over the other LLM's on the accuracy Likert scale (where there is SME overlap to determine the kappa value), we will average the SME's scores). As it pertains to ACCURACY, we will report the WIN RATE as a percentage. |
6 months
|
|
Comparative Accuracy: Margin of Win
Time Frame: 6 months
|
Comparing outputs of OpenEvidence, ChatGPT, Gemini, and Claude, SME's will rate each tool for accuracy on a 10-point Likert scale. For each specialty (medicine and psychiatry), we will calculate the win rate for each LLM on accuracy and then also report: Average margin from 2nd place LLM for ACCURACY.
|
6 months
|
|
Comparative Accuracy: Effect Size
Time Frame: 6 months
|
Comparing outputs of OpenEvidence, ChatGPT, Gemini, and Claude, SME's will rate each tool for accuracy on a 10-point Likert scale. For each specialty (medicine and psychiatry), we will use Cohen's d test to calculate the effect size of each LLM's average ACCURACY score in comparison to each of the other three LLM's. An effect size of 0.2 is small, 0.5 is medium, and 0.8 is considered large. |
6 months
|
|
Comparative Accuracy of LLM's: Interrater Reliability
Time Frame: 6 months
|
Comparing outputs of OpenEvidence, ChatGPT, Gemini, and Claude, SME's will rate each tool for accuracy on a 10-point Likert scale. Interrater reliability of Accuracy scores will be calculated using kappa value. |
6 months
|
|
Comparative Completeness of LLM's: Win Rate
Time Frame: 6 months
|
Comparing outputs of OpenEvidence, ChatGPT, Gemini, and Claude, SME's will rate each tool for COMPLETNESS on a 10-point Likert scale. Whichever model wins will be given a score of "1". If there is a tie, each gets 0.5 or 0.33 points, depending on the division of the tie. For each specialty (medicine and psychiatry), and each LLM, we will report the "win rate" with average margin from the second place LLM. For example, for all Medicine queries, we will report the percentage of times OpenEvidence "won" over the other LLM's on the accuracy Likert scale (where there is SME overlap to determine the kappa value), we will average the SME's scores). As it pertains to COMPLETENESS, we will report the WIN RATE as a percentage. |
6 months
|
|
Comparative Completeness: Margin of Win
Time Frame: 6 months
|
Comparing outputs of OpenEvidence, ChatGPT, Gemini, and Claude, SME's will rate each tool for accuracy on a 10-point Likert scale. For each specialty (medicine and psychiatry), we will calculate the win rate for each LLM on completeness and then also report: Average margin from 2nd place LLM for COMPLETENESS.
|
6 months
|
|
Comparative Completeness: Effect Size
Time Frame: 6 months
|
Comparing outputs of OpenEvidence, ChatGPT, Gemini, and Claude, SME's will rate each tool for accuracy on a 10-point Likert scale. For each specialty (medicine and psychiatry), we will use Cohen's d test to calculate the effect size of each LLM's average COMPLETENESS score in comparison to each of the other three LLM's. An effect size of 0.2 is small, 0.5 is medium, and 0.8 is considered large. |
6 months
|
|
Comparative Completeness of LLM's: Interrater Reliability
Time Frame: 6 months
|
Comparing outputs of OpenEvidence, ChatGPT, Gemini, and Claude, SME's will rate each tool for accuracy on a 10-point Likert scale. Interrater reliability of Completeness scores will be calculated using kappa value. |
6 months
|
|
Comparative Bias of LLM's: Win Rate
Time Frame: 6 months
|
Comparing outputs of OpenEvidence, ChatGPT, Gemini, and Claude, SME's will rate each tool for signs of bias on a 10-point Likert scale. Whichever model wins will be given a score of "1". If there is a tie, each gets 0.5 or 0.33 points, depending on the division of the tie. For each specialty (medicine and psychiatry), and each LLM, we will report the "win rate" with average margin from the second place LLM. For example, for all Medicine queries, we will report the percentage of times OpenEvidence "won" over the other LLM's on the accuracy Likert scale (where there is SME overlap to determine the kappa value), we will average the SME's scores). As it pertains to BIAS, we will report the WIN RATE as a percentage. |
6 months
|
|
Comparative Bias: Margin of Win
Time Frame: 6 months
|
Comparing outputs of OpenEvidence, ChatGPT, Gemini, and Claude, SME's will rate each tool for bias on a 10-point Likert scale. For each specialty (medicine and psychiatry), we will calculate the win rate for each LLM on bias and then also report: Average margin from 2nd place LLM for BIAS.
|
6 months
|
|
Comparative Bias: Effect Size
Time Frame: 6 months
|
Comparing outputs of OpenEvidence, ChatGPT, Gemini, and Claude, SME's will rate each tool for accuracy on a 10-point Likert scale. For each specialty (medicine and psychiatry), we will use Cohen's d test to calculate the effect size of each LLM's average BIAS score in comparison to each of the other three LLM's. An effect size of 0.2 is small, 0.5 is medium, and 0.8 is considered large. |
6 months
|
|
Comparative Bias of LLM's: Interrater Reliability
Time Frame: 6 months
|
Comparing outputs of OpenEvidence, ChatGPT, Gemini, and Claude, SME's will rate each tool for accuracy on a 10-point Likert scale. Interrater reliability of Bias scores will be calculated using kappa value. |
6 months
|
Collaborators and Investigators
Sponsor
Investigators
- Principal Investigator: Hannah K Galvin, MD, Cambridge Health Alliance
Publications and helpful links
General Publications
- Pagano S, Strumolo L, Michalk K, Schiegl J, Pulido LC, Reinhard J, Maderbacher G, Renkawitz T, Schuster M. Evaluating ChatGPT, Gemini and other Large Language Models (LLMs) in orthopaedic diagnostics: A prospective clinical study. Comput Struct Biotechnol J. 2024 Dec 26;28:9-15. doi: 10.1016/j.csbj.2024.12.013. eCollection 2025.
- Skryd A, Lawrence K. ChatGPT as a Tool for Medical Education and Clinical Decision-Making on the Wards: Case Study. JMIR Form Res. 2024 May 8;8:e51346. doi: 10.2196/51346.
- Rao A, Kim J, Kamineni M, Pang M, Lie W, Dreyer KJ, Succi MD. Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT-4 Versus GPT-3.5 in a Breast Imaging Pilot. J Am Coll Radiol. 2023 Oct;20(10):990-997. doi: 10.1016/j.jacr.2023.05.003. Epub 2023 Jun 21.
- Levkovich I, Rabin E, Brann M, Elyoseph Z. Large language models outperform general practitioners in identifying complex cases of childhood anxiety. Digit Health. 2024 Dec 15;10:20552076241294182. doi: 10.1177/20552076241294182. eCollection 2024 Jan-Dec.
- Griewing S, Knitza J, Boekhoff J, Hillen C, Lechner F, Wagner U, Wallwiener M, Kuhn S. Evolution of publicly available large language models for complex decision-making in breast cancer care. Arch Gynecol Obstet. 2024 Jul;310(1):537-550. doi: 10.1007/s00404-024-07565-4. Epub 2024 May 29.
- Benary M, Wang XD, Schmidt M, Soll D, Hilfenhaus G, Nassir M, Sigler C, Knodler M, Keller U, Beule D, Keilholz U, Leser U, Rieke DT. Leveraging Large Language Models for Decision Support in Personalized Oncology. JAMA Netw Open. 2023 Nov 1;6(11):e2343689. doi: 10.1001/jamanetworkopen.2023.43689.
Helpful Links
Study record dates
Study Major Dates
Study Start (Actual)
Primary Completion (Estimated)
Study Completion (Estimated)
Study Registration Dates
First Submitted
First Submitted That Met QC Criteria
First Posted (Actual)
Study Record Updates
Last Update Posted (Actual)
Last Update Submitted That Met QC Criteria
Last Verified
More Information
Terms related to this study
Other Study ID Numbers
- CHA-IRB-25-26-444
Plan for Individual participant data (IPD)
Plan to Share Individual Participant Data (IPD)?
IPD Plan Description
Drug and device information, study documents
Studies a U.S. FDA-regulated drug product
Studies a U.S. FDA-regulated device product
This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.
Clinical Trials on AI (Artificial Intelligence)
-
University of PaviaRecruitingCephalometric Analysis | Cephalometry | Artificial Intelligence (AI) | Artificial Intelligence (AI) in DiagnosisItaly
-
Uşak UniversityCompletedDigital Competences | Artificial Intelligence (AI) | Physiotherapist Students | Acceptance of Artificial Intelligence | Artificial Intelligence AttitudeTurkey
-
Assiut UniversityNot yet recruitingArtificial Intelligence (AI)
-
First Hospital of China Medical UniversityRecruitingEchocardiography | Cardiovascular Diseases (CVD) | Artificial Intelligence (AI) | Artificial Intelligence (AI) in DiagnosisChina
-
John J ChenCompletedCommunication | Interdisciplinary Communication | Artificial Intelligence (AI) | Artificial Intelligence TechnologyUnited States
-
Tanta UniversityCompletedAI (Artificial Intelligence) | MAFLDEgypt
-
Guangdong Provincial People's HospitalRecruitingAI (Artificial Intelligence)China
-
Tsinghua UniversityNot yet recruiting
-
Radboud University Medical CenterPrime Dental Alliance EindhovenNot yet recruitingArtificial Intelligence Supported Image Reviewing | Artificial Intelligence (AI) in DiagnosisNetherlands
-
TC Erciyes UniversityNot yet recruitingNursing Students | Artificial Intelligence (AI)Turkey (Türkiye)
Clinical Trials on AI clinical reference tool
-
University of North Carolina, Chapel HillCompletedElectronic Health Records | Health Information Technology | Burnout, Healthcare Workers | Clinical Workflow OptimizationUnited States
-
Sarah NabiaLiver Foundation, West Bengal; Endless HealthRecruitingHypertension | Fever | Diabete Mellitus | BreathlessnessIndia
-
Tampere University HospitalUniversity of Turku; Kuopio University Hospital; Tampere University; University... and other collaboratorsNot yet recruitingBreast Cancer | Artificial Intelligence | Mammography
-
UNICANCERRecruitingUnilateral Breast NeoplasmsFrance, Netherlands
-
Royal Cornwall Hospitals TrustUniversity of Birmingham; University of ExeterNot yet recruitingTinnitus | Hearing Loss, Adult-OnsetUnited Kingdom
-
Union Hospital, Tongji Medical College, Huazhong...The First Affiliated Hospital of Zhengzhou University; The First Affiliated...CompletedRadiology | AI (Artificial Intelligence) | X-RayChina
-
University of California, Los AngelesCompletedPhysician Workflow | Artificial Intelligence (AI)United States
-
Dana-Farber Cancer InstituteNational Cancer Institute (NCI)Not yet recruiting
-
Wuhan Union Hospital, ChinaCompletedRadiology | Artificial Intellegence | Chest X-ray for Clinical Evaluation | Large Language ModelChina
-
Shanghai 6th People's HospitalNot yet recruitingCohort Studies | Quality Control