Performance Comparison of Large Language Models in TAP Block Ultrasound Interpretation

February 2, 2026 updated by: Engin Ihsan Turan, Kanuni Sultan Suleyman Training and Research Hospital

Performance Comparison of Large Language Models in TAP Block Ultrasound Interpretation: A Double-Blind Prospective Study

The goal of this study is to learn how accurately two artificial intelligence (AI) models, Gemini 2.5 Pro and ChatGPT-5.1, can interpret ultrasound videos of the Transversus Abdominis Plane (TAP) block, a regional anesthesia technique used for pain control after surgery.

The main questions this study aims to answer are:

How accurately can each AI model identify anatomical structures on TAP block ultrasound videos? Can the AI models correctly evaluate the spread of local anesthetic and determine whether the block is successful? How closely do the AI models' answers match the evaluations of expert anesthesiologists? No additional procedures will be performed on patients. TAP blocks will be done as part of routine clinical care, and the ultrasound videos will be recorded and de-identified.

Participants will not need to do anything extra for the study. Experienced anesthesiologists will review the videos and provide expert answers. The AI models will be given the same videos and asked the same questions. A second expert, who does not know which answers came from humans or AI, will compare all responses.

The results will help researchers understand whether advanced AI systems can safely support clinicians in interpreting ultrasound-guided regional anesthesia procedures and improve education and decision-making in anesthesia practice.

Study Overview

Detailed Description

This study aims to evaluate how two advanced artificial intelligence (AI) models, Gemini 2.5 Pro and ChatGPT-5.1, interpret ultrasound videos of Transversus Abdominis Plane (TAP) block procedures. TAP blocks are performed as part of routine clinical care by experienced anesthesiologists. The ultrasound videos recorded during these procedures serve as the data source for this study. No additional procedures or patient involvement are required beyond standard care.

Ultrasound Video Processing All ultrasound recordings will be fully de-identified by removing patient names, dates, and any other identifying information.

Gemini 2.5 Pro will receive original video files. ChatGPT-5.1 will receive high-resolution GIF segments generated from the same recordings.

Both models will be given identical structured prompts consisting of eight clinically relevant questions about anatomic structures, needle placement, local anesthetic spread, dermatomal effects, and potential safety concerns.

Expert Participation

Two anesthesiology experts will participate independently:

Expert A will review each ultrasound video and answer the same set of eight clinical questions. These answers will serve as the primary human clinical reference.

Expert B will independently evaluate all responses, those from Expert A, Gemini, and ChatGPT-5.1, after they have been anonymized and randomly ordered. Expert B will not know whether a response originated from an AI model or a human expert. Expert B will assess anatomical accuracy, clarity, clinical appropriateness, and overall content quality for each answer.

If Expert A and Expert B disagree on the interpretation or quality assessment of any response, a third expert (Expert C), who is also experienced in ultrasound-guided regional anesthesia, will independently review the relevant responses. Expert C's evaluation will be used to resolve discrepancies and establish the final consensus.

Data Collected

For each TAP block video, the following information will be recorded:

Ultrasound and procedural details. Patient demographic descriptors (age, sex, BMI, ASA classification), used only for general characterization of the sample.

AI-related performance features such as response completeness, relevance, confidence level, and response time.

Study Type

Observational

Enrollment (Estimated)

40

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Contact

Study Locations

    • Istanbul
      • Istanbul, Istanbul, Turkey (Türkiye), 34303
        • Recruiting
        • Health Science University İstanbul Kanuni Sultan Süleyman Education and Training Hospital
        • Contact:

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult
  • Older Adult

Accepts Healthy Volunteers

No

Sampling Method

Probability Sample

Study Population

This study will include adult surgical patients who receive a lateral Transversus Abdominis Plane (L-TAP) block as part of routine anesthesia practice during elective operations. The population consists only of patients whose block procedures are already clinically indicated and performed by experienced anesthesiologists. No additional procedures are carried out for research purposes. Ultrasound videos obtained during standard block practice are de-identified and analyzed. The study does not involve patient follow-up or any change in clinical management.

Description

Inclusion Criteria:

  • Adults aged 18-85 years
  • ASA I-III physical status
  • Undergoing elective surgery with a lateral TAP block performed as part of routine anesthesia care
  • Complete ultrasound-guided block procedure recorded on video
  • Able to provide written informed consent

Exclusion Criteria:

  • Unsuccessful or incomplete TAP block procedure
  • Poor-quality ultrasound video (needle tip or anesthetic spread not visible)
  • Missing demographic or clinical data
  • Withdrawal of consent at any time

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Anatomical Interpretation Accuracy
Time Frame: At the time of video analysis
For each ultrasound video, the ability of both AI models (ChatGPT-5.1 and Gemini 2.5 Pro) to correctly identify key anatomical structures of the lateral TAP block (internal oblique, transversus abdominis, fascial plane, needle tip) will be evaluated. The accuracy of each model will be compared with the expert-defined reference answer.
At the time of video analysis

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Block Success Interpretation
Time Frame: At the time of video analysis.
Assessment of whether each AI model correctly determines block success based on needle placement and local anesthetic spread, compared with expert reference evaluation.
At the time of video analysis.
Needle Plane Evaluation
Time Frame: At the time of video analysis.
Determination of whether AI models correctly assess the needle tip location and whether it is within the correct interfascial plane (IO-TA fascia), compared with the expert reference.
At the time of video analysis.
Dermatomal Level Prediction
Time Frame: At the time of video analysis.
Comparison of each AI model's predicted dermatomal coverage (e.g., T10-T12) with the expert-provided reference dermatomal level.
At the time of video analysis.
Risk Awareness Assessment
Time Frame: At the time of video analysis.

Evaluation of whether each AI model correctly identifies potential risks on ultrasound images (e.g., peritoneal proximity, vascular structures).

0 = no risk awareness, 1 = partial, 2 = complete and appropriate risk identification.

At the time of video analysis.
Recommendation Quality
Time Frame: At the time of video analysis.

Assessment of the appropriateness of each model's suggestions (e.g., need for additional injection, repositioning) based on the ultrasound appearance.

Qualitative scoring by expert evaluator (0-10).

At the time of video analysis.
Agreement Between Experts
Time Frame: During expert evaluation phase.

To evaluate whether Expert A and Expert B provide consistent judgments for each parameter; and to resolve discrepancies through Expert C when needed.

Agreement / Disagreement resolved by third expert.

During expert evaluation phase.
AI Response Time
Time Frame: Captured automatically during model output.

Time required for each AI model to generate answers to the eight standardized questions.

Seconds (continuous variable).

Captured automatically during model output.

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

January 15, 2026

Primary Completion (Estimated)

April 30, 2026

Study Completion (Estimated)

May 1, 2026

Study Registration Dates

First Submitted

November 21, 2025

First Submitted That Met QC Criteria

November 21, 2025

First Posted (Actual)

December 3, 2025

Study Record Updates

Last Update Posted (Actual)

February 4, 2026

Last Update Submitted That Met QC Criteria

February 2, 2026

Last Verified

November 1, 2025

More Information

Terms related to this study

Other Study ID Numbers

  • SUCCESS OF LLMs in TAP BLOCK

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

UNDECIDED

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Artificial Intelegence

Clinical Trials on Gemini 2.5 Pro Evaluation

Subscribe