Large Language Models Versus Human Examiners for Grading Physiotherapy Clinical Cases (PACE-AI)

June 24, 2026 updated by: Alfredo Lerín Calvo, Neuron, Spain

Agreement Between Large Language Models and Faculty Assessment in the Evaluation of Clinical Reasoning Case Examinations in Undergraduate Physiotherapy Education: A Comparative Reliability Study

This study evaluates whether large language models (LLMs) can reliably assess written clinical-reasoning case examinations completed by undergraduate physiotherapy students, compared with faculty assessment. In the course "Specific Methods in Physiotherapy" (third year of the Physiotherapy Degree), students solve complex clinical cases that require clinical reasoning, technical knowledge, and therapeutic decision-making. These cases are traditionally graded by faculty, a time-consuming process that may show inter-rater variability.

A set of de-identified student case examinations will be assessed using the rubric currently applied in the course, which covers clarity and structure of clinical reasoning, integration of the biopsychosocial model (ICF and APTA frameworks), accuracy in identifying pain mechanisms, coherence between diagnosis, hypotheses, and treatment, originality and depth of analysis, and professional writing. Each examination will be scored independently by three LLMs (for example, Claude, ChatGPT, and Gemini), each receiving an identical standardized prompt that embeds the same rubric, and by faculty serving as the reference standard.

To avoid overloading faculty, full double human grading may not be feasible; the human reference will therefore consist of expert faculty grading by one independent rater or, when resources allow, two independent raters. In contrast, paired assessment is fully implemented across the AI models: each examination is scored by several LLMs, and each model is queried in duplicate, allowing the study to estimate agreement between models and the test-retest stability of each model.

The primary aim is to quantify agreement between LLM-generated scores and the faculty reference score. Secondary aims include agreement among the LLMs, test-retest reliability of each model, criterion-level agreement, the quality and usefulness of the qualitative feedback generated, the time and cost associated with each approach, and students' perceptions of the usefulness of human versus AI feedback.

The findings will clarify the strengths and limitations of LLMs as supportive tools for formative assessment in health-professions education and will inform criteria for their responsible and effective use. No LLM output will affect students' official grades, which remain the sole responsibility of faculty.

Study Overview

Status

Not yet recruiting

Conditions

Intervention / Treatment

Detailed Description

BACKGROUND AND RATIONALE The assessment of clinical case examinations in physiotherapy requires the appraisal of multiple dimensions, including clinical reasoning, selection of appropriate techniques, treatment dosage, ethical considerations, and patient communication. This grading process demands substantial faculty time and may be affected by evaluator fatigue and inter-rater variability. Large language models (LLMs) have shown notable capabilities in text comprehension, reasoning, and the generation of structured feedback, and preliminary evidence suggests they may provide consistent evaluations in medical education contexts. However, questions remain regarding their reliability, potential bias, and their ability to capture the complexity of clinical reasoning. There is currently limited empirical evidence on whether LLMs can complement human grading while maintaining quality standards and offering immediate formative feedback. This study addresses that gap through a systematic comparison between human assessment (reference standard) and LLM-assisted assessment using identical materials and criteria.

OBJECTIVES Primary objective: To quantify the agreement between the scores generated by LLMs and the faculty reference score in the assessment of physiotherapy clinical-reasoning case examinations.

Secondary objectives:

To estimate the agreement among different LLMs (inter-model reliability).
To estimate the test-retest reliability of each LLM (intra-model reliability) when the same examination is scored on repeated, independent administrations.
To evaluate agreement at the level of individual rubric criteria.
To compare the quality, specificity, and formative usefulness of the qualitative feedback produced by LLMs and by faculty.
To compare the time and cost associated with human and LLM assessment.
To assess students' perceptions of the usefulness, fairness, and transparency of human versus AI feedback.

STUDY DESIGN Cross-sectional inter-rater agreement and reliability study with repeated measures, in which the same set of de-identified student clinical case examinations is assessed independently by human and artificial-intelligence raters using a shared, predefined rubric. The study is observational and educational in nature; it does not modify the teaching or the official assessment received by students.

SETTING, PARTICIPANTS, AND MATERIALS The study is conducted within the course "Specific Methods in Physiotherapy" (third year, Physiotherapy Degree), in which clinical cases are used as a central learning and assessment tool. The units of analysis are the written case examinations produced by enrolled students (approximately 60 to 80 examinations). All examinations are anonymized prior to assessment so that no rater can identify the author. The grading instrument is the rubric already used in the course, comprising the following criteria: (1) clarity and structure of clinical reasoning; (2) integration of the biopsychosocial model (ICF and APTA frameworks); (3) accuracy in identifying pain mechanisms; (4) coherence between diagnosis, hypotheses, and treatment; (5) originality and depth of analysis; and (6) appropriate professional writing. Each criterion yields a partial score, and the criteria sum to a global score.

RATERS AND ASSESSMENT PROCEDURE Human assessment (reference standard): Each examination is scored by faculty with expertise in the course, applying the established rubric. The protocol is designed to accommodate two scenarios depending on faculty workload. In the preferred scenario, two faculty members score each examination independently (paired human correction), enabling estimation of human-human reliability and the use of the mean or consensus score as the reference. In the contingency scenario, to avoid overloading faculty, a single expert faculty rating (or, alternatively, the official course grade already assigned) is used as the reference standard; in that case, human-human reliability is not estimated within this study and is acknowledged as a limitation.

Artificial-intelligence assessment (paired AI correction): The same anonymized examinations are scored independently by three LLMs (for example, Claude, ChatGPT, and Gemini, in the versions available during the data-collection period). Each model receives an identical standardized prompt that embeds the same rubric and requests, for every examination, a partial score per criterion, a global score, and structured qualitative feedback. To assess intra-model (test-retest) reliability, each model is queried in duplicate in independent sessions under fixed generation parameters. This design implements "peer" correction across models: outputs are cross-compared between models (inter-model agreement) and against repeated runs of the same model (intra-model stability), mirroring the logic of paired review while remaining feasible without additional faculty burden. Grading order is randomized, and raters (human and AI) are blinded to one another's scores. The time required for each evaluation and the operating cost of each LLM are recorded.

OUTCOME MEASURES Primary outcome: agreement between each LLM's global score and the faculty reference global score.

Secondary outcomes: inter-model agreement among LLMs; intra-model test-retest reliability; criterion-level agreement; feedback quality (number of specific and actionable comments, coverage of case dimensions, and rated formative usefulness); efficiency (mean evaluation time and cost per evaluation); and student-perceived usefulness of human versus AI feedback (Likert scales).

STATISTICAL ANALYSIS General approach: Analyses will be performed in R. Continuous variables will be summarized as mean and standard deviation or median and interquartile range, according to distribution (assessed with the Shapiro-Wilk test and graphical inspection); categorical variables as absolute and relative frequencies. All tests will be two-sided with an alpha of 0.05, and 95% confidence intervals (CI) will be reported for all reliability and agreement estimates.

Primary analysis (LLM vs faculty agreement): For the global score (continuous), agreement between each LLM and the faculty reference will be quantified with the intraclass correlation coefficient (ICC), two-way random-effects model, absolute-agreement definition, single-rater and average-rater forms [ICC(2,1) and ICC(2,k)], following the conventions of McGraw and Wong and the reporting guidance of Koo and Li. ICC values will be interpreted as poor (<0.50), moderate (0.50-0.75), good (0.75-0.90), and excellent (>0.90). Systematic bias will be examined with Bland-Altman analysis, reporting the mean difference and 95% limits of agreement, with inspection for proportional bias. For criterion-level (ordinal) scores, agreement will be quantified with Cohen's weighted kappa using quadratic weights. Because kappa is sensitive to prevalence and marginal imbalance (the "kappa paradox"), Gwet's AC1/AC2 and the prevalence-adjusted bias-adjusted kappa (PABAK) will be reported as robust complements. Categorical agreement coefficients will be interpreted using the Landis and Koch benchmarks. Pre-specified targets, consistent with the project objectives, are ICC >= 0.75 for the global score and weighted kappa >= 0.60 at the criterion level.

Inter-model reliability: Agreement among the three LLMs on the global score will be estimated with a two-way random-effects ICC for multiple raters; for criterion-level categorical scores, Fleiss' kappa and Gwet's AC will be used. Pairwise model comparisons will also be reported.

Intra-model (test-retest) reliability: For each model, agreement between duplicate runs will be quantified with the ICC and the percentage of exact agreement for categorical criteria, complemented by Gwet's AC. The standard error of measurement (SEM) and the minimal detectable change (MDC95) will be derived from the ICC to express measurement precision in score units.

Feedback quality (qualitative and quantitative): The qualitative feedback will be analyzed through structured content analysis with predefined categories aligned to the rubric dimensions. The number of specific, actionable comments and the coverage of case dimensions (reasoning, technique, communication, ethics) will be counted for each rater type and compared using chi-square or Fisher's exact tests for proportions and Kruskal-Wallis tests for counts, with appropriate post-hoc comparisons. Inter-coder reliability for the content-analysis coding will itself be reported (Cohen's or Gwet's coefficient) to ensure the trustworthiness of the categorization.

Efficiency: Mean evaluation time will be compared between humans and each LLM using paired t-tests or Wilcoxon signed-rank tests depending on distribution. Cost per evaluation (faculty time valued at standard institutional rates and LLM usage fees) will be summarized descriptively.

Student perception: Likert responses will be summarized as medians and IQRs and as the proportion of favorable responses. Differences in perceived usefulness between human and AI feedback will be tested with the Wilcoxon signed-rank test, and the Friedman test will be used when more than two feedback sources are compared, with appropriate post-hoc analysis.

Handling of the reference-standard contingency: If two independent faculty ratings are obtained, human-human reliability will be reported and the mean or consensus score used as the reference for all LLM comparisons; if only a single faculty rating (or the official course grade) is available, that single value serves as the reference, the absence of an in-study human-human reliability estimate is reported as a limitation, and, where available, historical course data on faculty agreement are cited as external context. The AI-centered analyses (inter-model and intra-model reliability) are unaffected by this contingency and are performed in all cases.

Sensitivity analyses and missing data: Sensitivity analyses will explore the influence of score distribution and of any examinations with extreme scores. The pattern of any missing or non-evaluable outputs (for example, an LLM failing to return a parsable score) will be described, and complete-case analysis will be the primary approach, with the proportion of missing data reported.

SAMPLE SIZE JUSTIFICATION This is a reliability/agreement study, so the sample-size rationale is based on the precision of the reliability estimates rather than on hypothesis testing of a between-group difference. For an expected ICC of approximately 0.75 estimated with several raters and a target 95% CI half-width of about 0.10-0.12, established approximations for ICC precision (Bonett) indicate that on the order of 40-60 examinations are required; for kappa-based criterion agreement, the approximations of Donner and Rotnitzky yield comparable figures. The approximately 60-80 available examinations therefore provide adequate precision for the planned estimates. These figures are indicative and will be refined once the final number of evaluable examinations and raters is confirmed.

ETHICS AND DATA PROTECTION The study will be submitted to the Research Ethics Committee of Universidad Rey Juan Carlos for review and approval. Informed consent will be requested from participating students. All examinations are anonymized before assessment, data are processed in accordance with the EU General Data Protection Regulation (GDPR) and applicable national law, and participation does not affect students' final grades, which are determined exclusively by faculty. Students are informed that their de-identified case examinations will be assessed by both faculty and LLMs for educational-research purposes.

Study Type

Observational

Enrollment (Estimated)

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Contact

Name: Alfredo Lerín Calvo, MSc
Phone Number: +34620187457
Email: alfredo.lerin@lasallecampus.es

Study Locations

Spain
- Madrid
  - Madrid, Madrid, Spain, 28023
    - Centro Superior de Estudios Universitarios La Salle
    - Contact:
      
      Alfredo Lerín Calvo, MSc
      
      Phone Number: +34620187457
      
      Email: alfredo.lerin@lasallecampus.es

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

Adult
Older Adult

Accepts Healthy Volunteers

Yes

Sampling Method

Non-Probability Sample

Study Population

The study population comprises undergraduate students enrolled in the course "Specific Methods in Physiotherapy" (third year of the Physiotherapy Degree) during the study period, approximately 60 to 80 students. As part of the course, each student produces a written clinical-reasoning case examination. The de-identified examinations from consenting students constitute the units of analysis and are assessed independently, using the same predefined rubric, by faculty (reference standard) and by three large language models. No clinical intervention is applied and participation does not affect students' official grades, which are determined exclusively by faculty.

Description

Inclusion Criteria:

Students officially enrolled in the course "Specific Methods in Physiotherapy" (third year of the Physiotherapy Degree) during the study period.
Submission of a completed written clinical-reasoning case examination as part of the course.
Provision of informed consent for the anonymized examination to be used for educational-research purposes.

Exclusion Criteria:

Refusal to provide, or withdrawal of, informed consent.
Blank, incomplete, or non-evaluable examinations (e.g., no developed written response).
Examinations that cannot be reliably de-identified prior to assessment.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Number of groups / cohorts

Cohorts and Interventions

Group / Cohort	Intervention / Treatment
Anonymized physiotherapy clinical case examinations Single cohort consisting of de-identified written clinical-reasoning case examinations produced by undergraduate physiotherapy students in the course "Specific Methods in Physiotherapy." Each examination is assessed independently, using the same predefined rubric, by faculty (reference standard) and by three large language models (LLMs), with each model queried in duplicate to assess test-retest reliability. The examination is the unit of analysis; no participant follow-up is performed.	Diagnostic test: LLM-based assessment Assessment of each anonymized examination by three large language models (for example, Claude, ChatGPT, and Gemini, in the versions available during data collection). Each model receives an identical standardized prompt embedding the study rubric and returns a score per criterion, a global score, and structured qualitative feedback. Each model is queried in duplicate in independent sessions under fixed generation parameters to estimate intra-model (test-retest) reliability, and outputs are compared across models to estimate inter-model agreement. Diagnostic test: Faculty assessment (reference standard) Assessment of the same anonymized examinations by faculty with expertise in the course, applying the identical rubric, serving as the reference standard. In the preferred scenario, two faculty members score each examination independently (paired human correction); if faculty workload precludes this, a single expert faculty rating, or the official course grade already assigned, is used as the reference. Faculty and LLM raters are blinded to one another's scores.

Group / Cohort

Intervention / Treatment

Anonymized physiotherapy clinical case examinations

Single cohort consisting of de-identified written clinical-reasoning case examinations produced by undergraduate physiotherapy students in the course "Specific Methods in Physiotherapy." Each examination is assessed independently, using the same predefined rubric, by faculty (reference standard) and by three large language models (LLMs), with each model queried in duplicate to assess test-retest reliability. The examination is the unit of analysis; no participant follow-up is performed.

Diagnostic test: LLM-based assessment

Assessment of each anonymized examination by three large language models (for example, Claude, ChatGPT, and Gemini, in the versions available during data collection). Each model receives an identical standardized prompt embedding the study rubric and returns a score per criterion, a global score, and structured qualitative feedback. Each model is queried in duplicate in independent sessions under fixed generation parameters to estimate intra-model (test-retest) reliability, and outputs are compared across models to estimate inter-model agreement.

Diagnostic test: Faculty assessment (reference standard)

Assessment of the same anonymized examinations by faculty with expertise in the course, applying the identical rubric, serving as the reference standard. In the preferred scenario, two faculty members score each examination independently (paired human correction); if faculty workload precludes this, a single expert faculty rating, or the official course grade already assigned, is used as the reference. Faculty and LLM raters are blinded to one another's scores.

What is the study measuring?

Primary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Agreement between LLM global scores and the faculty reference global score Time Frame: Single cross-sectional assessment during the data-collection period (approximately 2 months)	Agreement between the global examination score generated by each large language model (LLM) and the faculty reference global score, computed for the same anonymized examinations. Agreement is quantified with the intraclass correlation coefficient (ICC), two-way random-effects model, absolute-agreement definition, single- and average-measures forms [ICC(2,1) and ICC(2,k)], with 95% confidence intervals. Systematic bias is examined with Bland-Altman analysis (mean difference and 95% limits of agreement). ICC is interpreted as poor (<0.50), moderate (0.50-0.75), good (0.75-0.90), or excellent (>0.90). Pre-specified target: ICC >= 0.75.	Single cross-sectional assessment during the data-collection period (approximately 2 months)

Secondary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Criterion-level agreement between LLM and faculty scores Time Frame: Single cross-sectional assessment during the data-collection period (approximately 2 months)	Agreement between LLM and faculty scores at the level of each individual rubric criterion (ordinal scores). Quantified with Cohen's weighted kappa (quadratic weights), with 95% confidence intervals. Because kappa is sensitive to prevalence and marginal imbalance, Gwet's AC1/AC2 and the prevalence-adjusted bias-adjusted kappa (PABAK) are reported as robust complements. Coefficients are interpreted using the Landis and Koch benchmarks. Pre-specified target: weighted kappa >= 0.60.	Single cross-sectional assessment during the data-collection period (approximately 2 months)
Intra-model test-retest reliability of each large language model Time Frame: Single cross-sectional assessment during the data-collection period (approximately 2 months)	Stability of each LLM's scoring across two independent duplicate runs of the same examination under fixed generation parameters. Quantified with the ICC for the global score and percentage of exact agreement (complemented by Gwet's AC) for criterion-level scores, with 95% confidence intervals. The standard error of measurement (SEM) and the minimal detectable change (MDC95) are derived from the ICC to express measurement precision in score units.	Single cross-sectional assessment during the data-collection period (approximately 2 months)
Quality and coverage of the qualitative feedback Time Frame: Assessed after completion of all evaluations, during the analysis period (approximately 3 months)	Quality of the qualitative feedback produced by each rater type (LLMs and faculty), assessed through structured content analysis with predefined categories aligned to the rubric dimensions. Outcomes include the number of specific, actionable comments per evaluation and the coverage of case dimensions (clinical reasoning, technique, communication, ethics). Counts and proportions are compared across rater types (chi-square or Fisher's exact tests for proportions; Kruskal-Wallis for counts, with post-hoc comparisons). Inter-coder reliability of the content-analysis coding is reported (Cohen's or Gwet's coefficient).	Assessed after completion of all evaluations, during the analysis period (approximately 3 months)
Mean evaluation time per examination: faculty versus LLM Time Frame: Single cross-sectional assessment during the data-collection period (approximately 2 months)	Mean time required to evaluate one examination, recorded separately for faculty and for each LLM. Compared between humans and LLMs using paired t-tests or Wilcoxon signed-rank tests according to distribution, with 95% confidence intervals for the mean difference. Reported in minutes per examination.	Single cross-sectional assessment during the data-collection period (approximately 2 months)
Cost per evaluation: faculty versus LLM Time Frame: Single cross-sectional assessment during the data-collection period (approximately 2 months)	Operating cost of evaluating one examination, comparing faculty time valued at standard institutional rates against LLM usage fees. Summarized descriptively (mean and standard deviation or median and interquartile range) per rater type and reported in euros per examination.	Single cross-sectional assessment during the data-collection period (approximately 2 months)

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Sponsor

Neuron, Spain

Collaborators

Centro Universitario La Salle

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Estimated)

August 1, 2026

Primary Completion (Estimated)

August 10, 2026

Study Completion (Estimated)

August 10, 2026

Study Registration Dates

First Submitted

June 24, 2026

First Submitted That Met QC Criteria

June 24, 2026

First Posted (Actual)

June 30, 2026

Study Record Updates

Last Update Posted (Actual)

June 30, 2026

Last Update Submitted That Met QC Criteria

June 24, 2026

Last Verified

June 1, 2026

More Information

Terms related to this study

Keywords

Other Study ID Numbers

ALCNR005

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

YES

IPD Plan Description

De-identified individual participant data (anonymized examination scores per rubric criterion and global score from all human and LLM raters, including duplicate LLM runs) and the corresponding data dictionary will be shared. The standardized LLM prompt, the scoring rubric, and the statistical analysis code will also be made available. Data will be deposited in the Zenodo open-access repository and assigned a permanent DOI. No directly identifying information will be shared; all examinations are anonymized prior to assessment.

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Educational Assessment

Liverpool John Moores University
University of Central Lancashire; Queen's University, Belfast; Newcastle University and other collaborators

Completed

Idea Density in Exam Performance (IDEP)

Medical Education | Health Education | Educational Assessment

United Kingdom
University Ghent

Completed

Validation of a New Assessment Tool for Chest Tube Insertion (ACTION) on Two Different Simulators

Validity | Educational Assessment | Chest Tube Insertion

Belgium
Vanderbilt University
University of California, Irvine

Completed

Using Baby Books to Promote Maternal and Child Health

Condition 1 - Educational Condition (Educational Book Group) | Condition 2 - Non-educational Condition (Non-educational Book Group) | Condition 3 - Control Condition (No-book Group)

United States
Institut Mutualiste Montsouris

Completed

Learning Process of Fifth Years Medical Students During a Psychiatric Role Play

Educational

France
Isparta University of Applied Sciences
Suleyman Demirel University

Recruiting

Flipped Classroom vs. Flip-Jigsaw in Theoretical Physiotherapy Education

Students | Educational Technology | Physiotherapy and Rehabilitation | Undergraduate Health Professional Students | Educational Interventions

Turkey (Türkiye)
National Taiwan University Hospital
National Science and Technology Council

Not yet recruiting

Point-of-Care Ultrasound (PoCUS) Teleeducation Curriculum for Hospital-at-home Care

Educational Problems

Taiwan
Tokat Gaziosmanpasa University

Recruiting

Teaching Electronic Fetal Monitoring to Midwifery Students With Peer Teaching Method

Educational Problems

Turkey
Erzurum Technical University

Recruiting

The Impact of Web 2.0 Tools on Nursing Education

Educational Problems

Turkey
Tel-Aviv Sourasky Medical Center

Recruiting

A Prospective Patient Education Program for IBD Patients

Educational Course

Israel
Tokat Gaziosmanpasa University

Completed

Family Planning Counseling and Simulation

Educational Problems

Turkey (Türkiye)

Clinical Trials on LLM-based assessment

Huseyin Kocakgol

Not yet recruiting

LLM in Urodynamic Education

Urodynamic Interpretation Skills

Turkey (Türkiye)
University of Michigan

Not yet recruiting

Feasibility and Acceptability of an LLM-based Chatbot for Family Caregivers: Evaluation Study

Caregivers

United States
Stanford University

Enrolling by invitation

Effectiveness of a Large Language Model-Based Educational Tool on Visual Field Test Reliability in Glaucoma Patients

Glaucoma | Visual Fields | Eye Disorders | Visual Field Tests

United States
Stanford University

Enrolling by invitation

Effectiveness of a Large Language Model-Based Educational Tool on Intraocular Lens Options

Intraocular Lens | Cataract Extraction | Cataract Surgery | Eye Disorders | Cataract and IOL Surgery | Cataract Surgery Experience

United States
The University of Texas Health Science Center,...
Health Science Education Small Grants Program

Not yet recruiting

Clinical Language Evaluation With AI for Residents (CLEAR2)

Patient Communication

United States
Kyoto University, Graduate School of Medicine
Fitting Cloud Inc.

Completed

LLM-Assisted vs Manual Writing for Clinical Documentation: Effects on Time and Quality

Clinical Documentation | Large Language Model | Clinician-in-the-loop

Japan
Beijing Anzhen Hospital

Not yet recruiting

LLM-Based Intelligent Health Management Assistant in Life-Cycle Health Management of Cardiac Surgery Patients (cFT-LLM)

Congenital Heart Disease | Aortic Aneurysm | Aortic Dissection | Heart Valve Disease | Coronary Artery Bypass Grafting | Cardiac Surgical Procedures

China
ITALO EUGENIO SOUZA GADELHA DE ABREU

Completed

DOACT Algorithm Versus AI-Based Decision Models in Oral Anticoagulant Therapy for Vascular Patients (DOACT)

Deep Vein Thrombosis | Artificial Intelligence | Clinical Decision Support Systems | Pulmonary Thromboembolisms | Superficial Thrombophlebitis

Brazil
Erasmus Medical Center

Completed

Ambient Scribe in General Practice: a Multi-perspective Before-after Longitudinal Mixed-methods Study (AI Scribe)

Workload

Netherlands
Peking University Third Hospital
Xinjiang Second Medical College

Not yet recruiting

Large Language Model Assistance for Clinical Decision-Making Among Rural Physicians

Clinical Decision-making

China

Large Language Models Versus Human Examiners for Grading Physiotherapy Clinical Cases (PACE-AI)

Agreement Between Large Language Models and Faculty Assessment in the Evaluation of Clinical Reasoning Case Examinations in Undergraduate Physiotherapy Education: A Comparative Reliability Study

Study Overview

Status

Conditions

Intervention / Treatment

Detailed Description

Study Type

Enrollment (Estimated)

Contacts and Locations

Study Contact

Study Locations

Participation Criteria

Eligibility Criteria

Ages Eligible for Study

Accepts Healthy Volunteers

Sampling Method

Study Population

Description

Study Plan

How is the study designed?

Design Details

Number of groups / cohorts

Cohorts and Interventions

Group / Cohort

Intervention / Treatment

What is the study measuring?

Primary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Secondary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Collaborators and Investigators

Sponsor

Collaborators

Study record dates

Study Major Dates

Study Start (Estimated)

Primary Completion (Estimated)

Study Completion (Estimated)

Study Registration Dates

First Submitted

First Submitted That Met QC Criteria

First Posted (Actual)

Study Record Updates

Last Update Posted (Actual)

Last Update Submitted That Met QC Criteria

Last Verified

More Information

Terms related to this study

Keywords

Other Study ID Numbers

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

IPD Plan Description

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

Clinical Trials on Educational Assessment

Clinical Trials on LLM-based assessment

Search Similar Trials

Sponsors and Collaborators

Medical Conditions

Drug Interventions

CROs by country

CROs in Singapore

Conditions

Rare Diseases

Drug Interventions

Dietary Supplements

Sponsor/Collaborators

Locations