cRCT of AI Agent in Clinital Resoning Training

June 1, 2026 updated by: Yue Li, Peking Union Medical College Hospital

AI Clinical Reasoning Training Agent on Medical Students' Clinical Reasoning Skills and Case-based Learning Experience: A Cluster Randomized Controlled Trial

Traditional medical education has long emphasized one-way transmission of theoretical knowledge, which presents limitations in the systematic cultivation of clinical reasoning skills among medical students. Miller's pyramid of clinical competence emphasizes the gradual transformation from theoretical knowledge to clinical practice ability. Case-based learning (CBL), as a teaching method centered on real or simulated clinical cases, is a key strategy to address the above limitations. Artificial intelligence (AI)-assisted clinical reasoning training tools can overcome time and space constraints, and offer students repeatable, adaptive, and real-time feedback case training, thereby reinforcing the sustained role of CBL in clinical reasoning development. Currently, it still lacks high-quality evidence from randomized controlled trials on the impact of AI agents on medical students' clinical reasoning skills.

This study plans to evaluate the impact of an AI clinical reasoning training agent on students' clinical reasoning training outcomes and CBL learning experience.

Primary Objective: To evaluate the impact of the AI agent on student learning outcomes (course examination scores and clinical reasoning test scores).

Secondary Objective: To investigate students' AI acceptance (perceived usefulness, perceived ease of use, satisfaction, and intention to use).

This study adopts a two-arm parallel cluster randomized controlled trial design. The trial is designed and reported in accordance with the CONSORT statement.

The study population will recruit Class of 2021 medical students (8-year program) from Peking Union Medical College and Class of 2020 medical students (8-year program) from Tsinghua University School of Medicine. Both cohorts are officially enrolled in the "Comprehensive Clinical Course" for the 2025-2026 academic year, have consistent foundational knowledge in basic medicine and diagnostics, and are in the phase of clinical medicine theory learning, not yet having entered clinical practice.

Using PASS 2025 software, the sample size per arm for the cRCT is 39, with number of clusters per arm K=N/M =13, Considering a 10% attrition or exclusion rate, the target recruitment is 88 participants.

Considering potential heterogeneity in baseline between students from the two schools, and possible contamination due to discussions among dormitory mates during the intervention, this study will adopt stratified cluster randomization, first stratifying by school, then using dormitory as the smallest randomization unit. Dormitories will be sorted by the random number, with the first half allocated to the intervention group and the second half to the control group. Participants' group assignment will be revealed via unique student ID only after baseline data collection and informed consent are completed.

This study will select five topics from the "Comprehensive Clinical Course": "Infectious Diarrhea," "Viral Hepatitis," "Bloodstream Infection," "Infective Endocarditis," and "Central Nervous System Infection". Standardized cases will be provided by the teaching faculty, with two cases per topic, totaling 10 cases. These cases will be used to train AI agent. After class, the AI agent training tasks will be sent to the intervention group, and study materials will be distributed to the control group.

Course examination scores and clinical reasoning test scores are the primary outcomes. AI technology acceptance including perceived usefulness, perceived ease of use, satisfaction, and intention to use are the secondary outcomes.

This study has been approved by the Research Ethics Committee of Peking Union Medical College Hospital (Approval No.: I-26PJ0851).

Study Overview

Status

Enrolling by invitation

Detailed Description

  1. Background Traditional medical education has long emphasized one-way transmission of theoretical knowledge, which presents limitations in the systematic cultivation of clinical reasoning skills among medical students. Miller's pyramid of clinical competence divides the learning process into four levels: knows, knows how, shows how, and does. It emphasizes the gradual transformation from theoretical knowledge to clinical practice ability. Within this framework, effective training of clinical reasoning not only relies on theoretical instruction, but also requires repeated case-based training and feedback.

    Case-based learning (CBL), as a teaching method centered on real or simulated clinical cases, is a key strategy to address the above limitations. By guiding students to analyze, discuss, and solve clinical problems in cases, CBL promotes the deep integration of theoretical knowledge and clinical decision-making, thereby systematically exercising students' clinical reasoning skills. CBL is not merely a supplement to theoretical teaching; it is an indispensable practical step in the transition of clinical reasoning from "knowing" to "doing".

    In recent years, artificial intelligence (AI) has gradually emerged in medical education, particularly showing potential in simulating clinical scenarios and providing personalized feedback. AI-assisted clinical reasoning training tools can overcome time and space constraints, and offer students repeatable, adaptive, and real-time feedback case training, thereby reinforcing the sustained role of CBL in clinical reasoning development. Currently, it still lacks high-quality evidence from randomized controlled trials on the impact of AI agents on medical students' clinical reasoning skills.

    This study plans to evaluate the impact of an AI clinical reasoning training agent on students' clinical reasoning training outcomes and CBL learning experience. We will conduct a cluster randomized controlled trial (cRCT) in a real teaching environment. We hope to provide empirical evidence and replicable implementation experience for the integration of AI and medical education.

  2. Objectives Primary Objective: To evaluate the impact of the AI agent on student learning outcomes (course examination scores and clinical reasoning test scores).

    Secondary Objective: To investigate students' AI acceptance (perceived usefulness, perceived ease of use, satisfaction, and intention to use).

  3. Methods 3.1 Study Design This study adopts a two-arm parallel cluster randomized controlled trial design. The trial is designed and reported in accordance with the CONSORT statement.

3.2 Study Population and Sample Size The study population will recruit Class of 2021 medical students (8-year program) from Peking Union Medical College and Class of 2020 medical students (8-year program) from Tsinghua University School of Medicine. Both cohorts are officially enrolled in the "Comprehensive Clinical Course" for the 2025-2026 academic year, have consistent foundational knowledge in basic medicine and diagnostics, and are in the phase of clinical medicine theory learning, not yet having entered clinical practice.

Inclusion criteria:

  • full-time registered and enrolled in the "Comprehensive Clinical Course"
  • signed informed consent, voluntary participation in this study and completion of relevant tests and questionnaires

Exclusion criteria:

·planned suspension of studies, withdrawal, or major transfer during the study period The sample size calculation formula per arm is N=DE×[2×(Z1-α/2+ Z1-β)2×σ2/δ2]. Considering clinical reasoning test scores and clinical scenario simulation scores as primary outcomes, the difference in mean scores between intervention and control groups is set at δ=2 points (out of 100) based on teaching experience; the standard deviation σ=3 points is set based on students' baseline diagnostic scores. Using dormitory as the smallest cluster unit, with average cluster size M=3. Based on previous studies and considering that small cluster sizes have little impact on total sample size, the intracluster correlation coefficient (ICC, ρ) is set at 0.02, giving a design effect DE=1+(M-1)×ICC=1.04. The significance level α=0.05 (two-sided), statistical power 1-β=0.8, Z1-α/2=1.96, Z1-β=0.84. Using PASS 2025 software, the sample size per arm for the cRCT is 39, with number of clusters per arm K=N/M =13, Considering a 10% attrition or exclusion rate, the target recruitment is 88 participants.

Recruitment will be conducted through class information sessions, where the study purpose and design are explained in detail. Participating students who enroll and complete the study will receive an incentive. Upon enrollment, all students in both intervention and control groups will complete a questionnaire on AI literacy and AI use interest as baseline data. The AI literacy section is designed based on the "Expert Consensus on the Artificial Intelligence Proficiency Competency List and Assessment Framework for Medical Students (2025 Edition)".

3.3 Randomization and Blinding Considering potential heterogeneity in baseline between students from the two schools, and possible contamination due to discussions among dormitory mates during the intervention, this study will adopt stratified cluster randomization, first stratifying by school, then using dormitory as the smallest randomization unit. A random number will be generated for each dormitory using WPS spreadsheet; dormitories will be sorted by the random number, with the first half allocated to the intervention group and the second half to the control group. The allocation scheme will be kept by research personnel not directly involved in teaching. Participants' group assignment will be revealed via unique student ID only after baseline data collection and informed consent are completed. Participants cannot be blinded to their use of the AI tool during the intervention but will be instructed not to share accounts or learning materials across groups. The primary outcome assessor will be blinded.

3.4 Intervention and Control This study will select five topics from the "Comprehensive Clinical Course": "Infectious Diarrhea," "Viral Hepatitis," "Bloodstream Infection," "Infective Endocarditis," and "Central Nervous System Infection". Standardized cases will be provided by the teaching faculty, with two cases per topic, totaling 10 cases. The case design template consists of case information, supplementary diagnostic and treatment process information, clinical reasoning questions, and answer keys, breaking down the entire diagnostic and treatment process into multiple steps. The cases will be reviewed and approved by two senior clinical faculty members to ensure appropriate content and difficulty level. These 10 cases will be developed into an AI agent on the Rain Classroom platform. The agent guides students step-by-step through history taking, physical examination, ancillary test selection, diagnostic reasoning, etc., through interactive dialogue, providing real-time personalized feedback. It is accessible on mobile and computer devices and supports voice interaction. In developing the agent, special emphasis is placed on the "reasoning guidance" function, limiting scenarios where "students ask for answers and get them directly". The answer key (script) will be released systematically after training, highlighting the roles of the learning materials as a "reference book" and the agent as a "practice field". Multiple rounds of testing with teachers and students will be conducted to fix potential issues in logic, interaction, stability, etc.

3.4.1 Intervention Measures After class, the AI agent training tasks will be sent to students in the intervention group. Each intervention group student is required to complete training on all 10 cases. The system backend will automatically record each student's AI interaction logs (e.g., training duration, number of interactions, logical order of history taking, types and frequency of AI feedback, etc.). These data will serve as adherence verification evidence and process materials for reasoning training, used for quality control and subsequent outcome analysis. Before the intervention begins, students will receive training on how to use the AI agent. The technical team will provide full technical support throughout the study period.

3.4.2 Control Measures The same 10 cases including case information, questions, and answer keys will be distributed as learning materials to the control group after class. Control group students are also required to complete self-study of these cases. At the end of the study, information on control group students' material learning duration, learning frequency, and use of external AI tools will be collected via questionnaire for comparison with the intervention group. The AI agent will be made available to all students after the study concludes to ensure educational equity.

3.6 Outcomes and Data Collection 3.6.1 Primary Outcomes

  1. Course examination scores: The examination will be administered one week after the course is finished. The Infection module includes 8 of A1 multiple choice questions (MCQs), 8 of case summary MCQs, 2 of A3 case-cluster MCQs, 3 A4 case-series best-answer MCQs, and 1 of case analysis questions, with a total score of 25. All test questions are provided by the teaching faculty, covering core diseases from the five topics. The questions will reviewed by the same two senior clinical faculty members to ensure appropriate case selection and difficulty. Case analysis questions are subjective and will be graded by two assessors who are singled-blinded.
  2. Clinical reasoning test score: Students will finish a clinical reasoning test one months after the course. It is designed based on a case of infective endocarditis (IE) and is supposed to be accomplished in 30 min. There are five questions in total. Information and questions will be presented step by step, with a total score of 25. The rating scale of IE case consists of five items (corresponding to five questions) and each item is scaled by 1 (novice level), 3 (basic level), and 5 (expert level). Two assessors will be single-blinded and will score each student's answer. Final test score is the average score of these two assessors.

This approach follows the principle of fairness in educational research and minimizes the Hawthorne effect and evaluation anxiety on study results, aiming to improve internal validity.

3.6.2 Secondary Outcomes AI technology acceptance: After the test, an electronic questionnaire of AI technology acceptance will be sent to intervention group students to assess their acceptance of the AI agent. The questionnaire is based on the Technology Acceptance Model (TAM) and includes four dimensions: perceived usefulness (perceived enhancement of clinical reasoning training by the AI agent), perceived ease of use (interface friendliness and operational convenience of the AI agent), satisfaction (satisfaction with the AI agent), and intention to use (willingness to use and recommend the AI agent in the future). A 5-point Likert scale is used (1 = strongly disagree, 5 = strongly agree). The questionnaire will be pilot-tested for reliability and validity. After the questionnaire survey, purposive sampling will be used to recruit participants for qualitative interviews. The final number of interviewees will be determined based on the principle of information saturation. Semi-structured interviews will be conducted to supplement understanding of intervention group students' experiences with the AI agent, including usage issues and reasoning guidance capabilities.

3.7 Data Analysis Quantitative data will be analyzed using SPSS 25.0 software, with significance level α=0.05. Qualitative data will be analyzed using NVivo 14.0 software.

  1. Baseline data comparison: Descriptive statistics will be used to describe baseline characteristics, comparing demographic information, diagnostic scores, AI literacy, and AI use interest between intervention and control groups to do baseline comparison. Normally distributed continuous variables will be expressed as mean ± standard deviation, with independent samples t-test for between-group comparisons. Categorical variables will be expressed as frequency (percentage), with chi-square test for between-group comparisons. If P > 0.05, randomization is considered successful, with no significant baseline differences. If P < 0.05, these variables will be included as covariates in subsequent analyses.
  2. Primary outcome analysis: Independent samples t-test will be used to compare mean scores between intervention and control groups. Multiple linear regression models will be constructed with the score as the dependent variable, intervention group as the main independent variable, and baseline AI literacy and diagnostic score as covariates. Intention-to-treat (ITT) and per-protocol (PP) analyses will be conducted as sensitivity analyses. Criteria for PP analysis:

    • completion of all 10 case trainings
    • at least 6 interaction rounds per case
    • completion of all preset questions
  3. Secondary outcome analysis: Likert scale scores for AI technology acceptance are expected to be non-normally distributed continuous variables; Mann-Whitney U test will be used for between-group comparisons. Thematic analysis will be applied to interview transcripts to extract core themes such as usefulness, ease of use, satisfaction, barriers, and suggestions. Two researchers will independently code and check theme consistency.
  4. Exploratory analysis: Correlations among AI literacy, AI interaction performance, and AI technology acceptance will be explored.

4 Quality Control

  1. Personnel training: All research personnel involved will receive standardized training to ensure they understand towards the study protocol, including recruitment, informed consent, data collection, and AI agent usage guidance.
  2. Data quality: Before the formal study, questionnaires will be pilot-tested for reliability and validity. All data entry will be double-checked to ensure accuracy. AI usage behavior data recorded in the backend will be verified for completeness and accuracy.
  3. Intervention adherence: Adherence of intervention group students will be assessed using data from the backend.
  4. Study process monitoring: Regular research team meetings will be held to discuss progress, resolve issues, and monitor study procedures to ensure compliance with the protocol.

5 Ethics and Privacy Protection All participants will sign an informed consent form, clearly stating the study purpose, procedures, and right to withdraw. Student scores, questionnaire results, and other data collected during the study will be stored and analyzed anonymously. Study results will be used only for academic publication, without involving personal privacy or commercial interests, and will not affect students' final course assessments.

This study has been approved by the Research Ethics Committee of Peking Union Medical College Hospital (Approval No.: I-26PJ0851).

Study Type

Interventional

Enrollment (Estimated)

88

Phase

  • Not Applicable

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

    • Beijing Municipality
      • Beijing, Beijing Municipality, China, 100730
        • Peking Union Medical College Hospital

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Child
  • Adult
  • Older Adult

Accepts Healthy Volunteers

No

Description

Inclusion criteria:

  • full-time registered and enrolled in the "Comprehensive Clinical Course"
  • signed informed consent, voluntary participation in this study and completion of relevant tests and questionnaires

Exclusion criteria:

· planned suspension of studies, withdrawal, or major transfer during the study period

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

  • Primary Purpose: Other
  • Allocation: Randomized
  • Interventional Model: Parallel Assignment
  • Masking: Single

Arms and Interventions

Participant Group / Arm
Intervention / Treatment
Experimental: intervention group (AI Agent)
AI agent tasks for clinical reasoning training
Experimental: control group (study materials)
Study materials including case information, questions, and answer keys

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Course examination scores
Time Frame: One week after the course
The examination includes 8 of A1 multiple choice questions (MCQs), 8 of case summary MCQs, 2 of A3 case-cluster MCQs, 3 A4 case-series best-answer MCQs, and 1 of case analysis questions, with a total score of 25. All test questions are provided by the teaching faculty, covering core diseases from the five topics. The questions will reviewed by the same two senior clinical faculty members to ensure appropriate case selection and difficulty.
One week after the course
Clinical reasoning test score
Time Frame: One month after the course
The test is designed based on a case of infective endocarditis (IE) and is supposed to be accomplished in 30 min. There are five questions in total. Information and questions will be presented step by step, with a total score of 25.
One month after the course

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
AI technology acceptance
Time Frame: Immediately after the clinical reasoning test
An electronic questionnaire of AI technology acceptance will be administered to intervention group students to assess their acceptance of the AI agent technology. The questionnaire is based on the Technology Acceptance Model (TAM) and includes four dimensions: perceived usefulness (perceived enhancement of clinical reasoning training by the AI agent), perceived ease of use (interface friendliness and operational convenience of the AI agent), satisfaction (satisfaction with the AI agent), and intention to use (willingness to use and recommend the AI agent in the future). A 5-point Likert scale is used (1 = strongly disagree, 5 = strongly agree).
Immediately after the clinical reasoning test

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Publications and helpful links

The person responsible for entering information about the study voluntarily provides these publications. These may be about anything related to the study.

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

April 1, 2026

Primary Completion (Actual)

May 8, 2026

Study Completion (Estimated)

August 31, 2026

Study Registration Dates

First Submitted

April 22, 2026

First Submitted That Met QC Criteria

June 1, 2026

First Posted (Actual)

June 3, 2026

Study Record Updates

Last Update Posted (Actual)

June 3, 2026

Last Update Submitted That Met QC Criteria

June 1, 2026

Last Verified

June 1, 2026

More Information

Terms related to this study

Other Study ID Numbers

  • K10357

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

UNDECIDED

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Medical Education

Clinical Trials on AI agent

Subscribe