LLM Performance in Endodontic Diagnostics

December 8, 2025 updated by: Marmara University

Evaluating ChatGPT-4o, Gemini and Claude 3.7 in Endodontic Diagnostics: A Prospective Clinical Study

The goal of this prospective observational study is to evaluate the ability of three large language models (ChatGPT-4o, Gemini Advanced, and Claude 3.7) to support diagnosis and treatment decision-making in adult patients presenting with common endodontic conditions.

The main questions the study aims to answer are:

Can LLMs accurately determine the endodontic diagnosis when provided with structured clinical information and periapical radiographs?

Can LLMs propose appropriate treatment plans comparable to decisions made by endodontic specialists?

To answer these questions, researchers will compare the diagnostic and treatment accuracy of three AI models using a consensus diagnosis from endodontic specialists as the reference standard.

Participants will:

Receive routine endodontic examination and periapical radiographs as part of standard clinical care.

Have their anonymized clinical histories and radiographs entered into the three AI models.

Not interact directly with any AI system; all evaluations will be performed by the research team.

This study aims to understand how large language models perform under real-world clinical conditions and whether these systems may play a supportive role in endodontic diagnostics in the future.

Study Overview

Status

Completed

Conditions

Endodontic Diagnosis, Endodontic Diseases, Endodontic Treatment, Endodontic Decision-making

Intervention / Treatment

Diagnostic test: AI-Based Diagnostic Assessment

Detailed Description

This prospective observational study aims to evaluate the real-time diagnostic and treatment decision-making performance of three large language models-ChatGPT-4o, Gemini Advanced, and Claude 3.7-in an endodontic clinical setting. A total of 120 patients presenting to the endodontic clinic were examined, and detailed medical/dental histories, clinical findings, and periapical radiographs were collected. Each anonymized case was then presented to the three LLMs using a standardized prompt asking for the diagnosis and the appropriate treatment plan.

All models were used in their default multimodal configurations without enabling web-search functions, plug-ins, or external data retrieval. Each question was submitted only once in isolated chat sessions to prevent memory carry-over. Responses were saved verbatim and compared with the reference diagnoses and treatment plans established by a panel of endodontic specialists.

This study was designed to mimic real-world clinical conditions as closely as possible, providing a realistic assessment of how these systems might perform when used by clinicians in everyday practice. Understanding their capabilities and limitations in authentic clinical scenarios is essential, as LLMs are expected to play an increasingly vital role in future dental care particularly in decision support, triage, and patient education. By identifying where these models perform well and where they fall short, this research aims to inform safe and effective clinical integration as LLM technologies continue to advance.

Study Type

Observational

Enrollment (Actual)

120

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

Turkey (Türkiye)
- Istanbul
  - Maltepe, Istanbul, Turkey (Türkiye), 34856
    - Faculty of Dentistry, Marmara University

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

Adult
Older Adult

Accepts Healthy Volunteers

Sampling Method

Non-Probability Sample

Study Population

The study population consisted of adult patients attending or referred to the endodontic clinic of Marmara University. All participants presented with common endodontic conditions such as pulpitis, necrosis, primary or secondary apical periodontitis, or the need for retreatment. After obtaining consent, each patient underwent a structured paper-based medical and dental history assessment and periapical radiographic examination. A total of 120 clinically verified endodontic cases were included.

Description

Inclusion Criteria:

Adult patients (≥18 years old) presenting to or referred to the Endodontic Clinic.

Patients with a clinically verified endodontic condition requiring diagnosis and treatment planning.

Patients who agreed to participate and provided informed consent.

Patients for whom a complete paper-based medical/dental history and periapical radiograph were obtained during the clinical visit.

Exclusion Criteria:

Exclusion Criteria

Patients who declined participation or did not provide informed consent.

Pediatric patients (<18 years old) referred to the Pediatric Dentistry Clinic.

Patients attending the clinic with non-endodontic complaints (e.g., post-extraction alveolitis, third-molar extraction problems).

Cases with incomplete clinical information or missing radiographs.

Patients unable to undergo standard endodontic examination procedures.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Number of groups / cohorts

Cohorts and Interventions

Group / Cohort	Intervention / Treatment
Endodontic Patients Cohort This cohort includes 120 consecutive patients presenting to the endodontic clinic with clinically verified endodontic conditions. Clinical history and periapical radiographs were collected, and diagnostic/treatment recommendations generated by AI models were compared with expert consensus.	Diagnostic test: AI-Based Diagnostic Assessment Participants' anonymized clinical information, including structured patient history and periapical radiographs, was used as input for three large language models (ChatGPT-4o, Gemini Advanced, Claude 3.7). The models were asked to determine the endodontic diagnosis and propose an appropriate treatment plan. No treatment, device, or drug was administered to participants. The intervention consists solely of AI-based interpretation of pre-existing clinical data.

Group / Cohort

Intervention / Treatment

Endodontic Patients Cohort

This cohort includes 120 consecutive patients presenting to the endodontic clinic with clinically verified endodontic conditions. Clinical history and periapical radiographs were collected, and diagnostic/treatment recommendations generated by AI models were compared with expert consensus.

Diagnostic test: AI-Based Diagnostic Assessment

Participants' anonymized clinical information, including structured patient history and periapical radiographs, was used as input for three large language models (ChatGPT-4o, Gemini Advanced, Claude 3.7). The models were asked to determine the endodontic diagnosis and propose an appropriate treatment plan. No treatment, device, or drug was administered to participants. The intervention consists solely of AI-based interpretation of pre-existing clinical data.

What is the study measuring?

Primary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Clinician Diagnosis Accuracy Based on Paper-Based History and Periapical Radiograph Time Frame: 7 july-5 august	Assessment of the diagnostic decision made by endodontic clinicians after reviewing a paper-based patient history form and a standardized periapical radiograph. Accuracy is determined by comparing the clinician's diagnosis with the consensus diagnosis established by three independent endodontic specialists. Data will be collected for all 120 patients at the time of initial clinical evaluation.	7 july-5 august

Secondary Outcome Measures

Outcome Measure	Measure Description	Time Frame
LLM-Generated Diagnosis and Treatment Planning Performance Time Frame: august-september	Evaluation of diagnostic and treatment recommendations generated by large language models (LLMs)-ChatGPT-4o, Gemini Advanced, and Claude 3.7-after receiving the same paper-based patient history and periapical radiograph provided to clinicians. LLM responses will be compared to the gold-standard specialist consensus for both diagnosis and treatment decisions.	august-september

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Sponsor

Marmara University

Investigators

Study Director: ayşe karadayı, asst. prof., Marmara University Faculty of Dentistry

Publications and helpful links

The person responsible for entering information about the study voluntarily provides these publications. These may be about anything related to the study.

General Publications

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

July 7, 2025

Primary Completion (Actual)

August 5, 2025

Study Completion (Actual)

October 3, 2025

Study Registration Dates

First Submitted

November 24, 2025

First Submitted That Met QC Criteria

December 8, 2025

First Posted (Actual)

December 15, 2025

Study Record Updates

Last Update Posted (Actual)

December 15, 2025

Last Update Submitted That Met QC Criteria

December 8, 2025

Last Verified

December 1, 2025

More Information

Terms related to this study

Keywords

access cavity cleaning, air abrasion, air polishing, CLSM, ethanol, push-out bond strength

Other Study ID Numbers

2025-38

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

UNDECIDED

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Endodontic Diagnosis, Endodontic Diseases, Endodontic Treatment, Endodontic Decision-making

Lumendo AG
Avania

Recruiting

A Study to Assess the Clinical Success of Root Canal Treatment Using Novel Obturation Material.

Endodontic Treatment

Turkey
Ankara University

Completed

Disinfection Effect of Gaseous Ozone

Endodontic Treatment
Kutahya Health Sciences University

Completed

Effect of Endodontic Case Difficulty on Endodontic Mishaps: Clinical Study

Root Canal Treatment | Effect of Endodontic Case Difficulty on Endodontic Mishaps

Turkey
Dow University of Health Sciences

Active, not recruiting

Comparison of Manual and Mechanical Glide Path Techniques for Measuring Root Canal Length Using Standalone Apex Locator and an Endomotor With built-in Apex Locator.

Endodontic Treatment

Pakistan
Aristotle University Of Thessaloniki

Active, not recruiting

Endodontic Posts Using the CAD/CAM Digital Method.

Tooth Restoration | Endodontic Treatment

Greece
Suez Canal University

Completed

Evaluation of Guided Endodontic Microsurgery and Two Different Retrograde Filling Materials

Endodontic Re-treatment Failure

Egypt
University of Sao Paulo

Completed

Mechanized Instrumentation for Endodontic Treatment of Primary Teeth

Endodontic Treatment of Primary Teeth

Brazil
Damascus University

Completed

The 3-Dimensional Printed Guide in Endodontic Microsurgery

Endodontic Disease | Endodontic Re-treatment Failure

Syrian Arab Republic
Cairo University

Unknown

Guided Versus Conventional Periapical Endodontic Surgery

Endodontic Disease | Endodontic Re-treatment Failure
BUSRA OZTURK

Recruiting

EFFECT OF NITI SYSTEMS AND XP-ENDO FINISHER ON POSTOPERATIVE PAIN

Postoperative Pain After Endodontic Treatment

Turkey (Türkiye)

Clinical Trials on AI-Based Diagnostic Assessment

Valentina Cerrone
Federico II University; University of Salerno, Italy

Recruiting

Refining mUltiple Artificial intelliGence strateGies for Automatic Pain Assessment Investigations: RUGGI Study (RUGGI)

Chronic Pain | Neuropathic Pain | Cancer Pain | Pain Assessment

Italy
Bukret Plastic Surgery

Completed

AI Risk Assessment Model for Complication Prevention in Plastic Surgery (Artificial Intelligence) (AI)

Risk Factors | Risk Assessment

Argentina
Chang Gung Memorial Hospital

Recruiting

Prediction of Cognitive Test Performance Using AI-Based Analysis of Narrative Speech

Mild Cognitive Impairment | Subjective Cognitive Decline | Cognitive Function Assessment

Taiwan
The Eye Hospital of Wenzhou Medical University

Recruiting

AI-Driven Cancer Diagnosis and Prediction With EHR

Tumor

China
The Eye Hospital of Wenzhou Medical University

Recruiting

Early Diagnosis and Prediction of Maternal and Neonatal Diseases: (EDPMND)

Pregnancy-Related and Neonatal Disorders

China
The Eye Hospital of Wenzhou Medical University

Recruiting

AI-Driven Prediction of Hospital-Acquired Infections With EHR

Hospital-acquired Infections

China
Riphah International University

Not yet recruiting

Rehabilitation Assessment of Motor Function In Cerebral Palsy Using Explainable AI

Cerebral Palsy Children

Pakistan
Second Affiliated Hospital, School of Medicine,...

Recruiting

A Study on Predicting the Risk of Distant Metastasis in Breast Cancer Using AI-Generated Spatial Pathological Maps (ARGUS project)

Breast Cancer

China
The Eye Hospital of Wenzhou Medical University

Recruiting

Ophthalmic Multimodal AI-Assisted Medical Decision-Making

Ocular Diseases

China, Macau
Children's Hospital of Fudan University
Chengdu Women's and Children's Central Hospital; Xiamen Children's Hospital; Kunming... and other collaborators

Recruiting

The Effect of AI-assisted cEEG Diagnosis on the Administration of Antiseizure Medication in Neonatal Seizures

Neonatal Seizure

China

LLM Performance in Endodontic Diagnostics

Evaluating ChatGPT-4o, Gemini and Claude 3.7 in Endodontic Diagnostics: A Prospective Clinical Study

Study Overview

Status

Conditions

Intervention / Treatment

Detailed Description

Study Type

Enrollment (Actual)

Contacts and Locations

Study Locations

Participation Criteria

Eligibility Criteria

Ages Eligible for Study

Accepts Healthy Volunteers

Sampling Method

Study Population

Description

Study Plan

How is the study designed?

Design Details

Number of groups / cohorts

Cohorts and Interventions

Group / Cohort

Intervention / Treatment

What is the study measuring?

Primary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Secondary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Collaborators and Investigators

Sponsor

Investigators

Publications and helpful links

General Publications

Study record dates

Study Major Dates

Study Start (Actual)

Primary Completion (Actual)

Study Completion (Actual)

Study Registration Dates

First Submitted

First Submitted That Met QC Criteria

First Posted (Actual)

Study Record Updates

Last Update Posted (Actual)

Last Update Submitted That Met QC Criteria

Last Verified

More Information

Terms related to this study

Keywords

Other Study ID Numbers

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

Clinical Trials on Endodontic Diagnosis, Endodontic Diseases, Endodontic Treatment, Endodontic Decision-making

Clinical Trials on AI-Based Diagnostic Assessment

Search Similar Trials

Sponsors and Collaborators

Medical Conditions

Drug Interventions

CROs by country

CROs in Guinea

Conditions

Rare Diseases

Drug Interventions

Dietary Supplements

Sponsor/Collaborators

Locations