- ICH GCP
- US Clinical Trials Registry
- Clinical Trial NCT07481162
AI vs Human Exam Assessment and Development (AHEAD Trial) (AHEAD)
Psychometric Performance and Student Perceptions of AI- Versus Human-Generated Multiple-Choice Question Development in Medical Education: The AHEAD Randomized Controlled Trial
The Artificial Intelligence (AI) vs Human Exam Assessment and Development (AHEAD) Trial is a participant-blinded randomized controlled trial conducted among first-year medical students at the University of British Columbia. The study evaluates whether multiple-choice examination questions generated using large language models (LLMs) perform comparably to traditionally human-written questions in medical education.
Participants were randomized to complete one of two versions of a formative mock final examination consisting of 112 case-based single-best-answer multiple-choice questions (MCQs) aligned with the same course learning objectives. One exam version contained AI-generated questions produced using a structured LLM workflow with independent AI verification, while the other contained questions authored by senior medical students using conventional methods.
The study evaluates exam feasibility, psychometric reliability, validity, student acceptability, and educational impact. Outcomes include exam performance, item discrimination indices, distractor efficiency, student perceptions of exam quality and difficulty, and changes in perceived preparedness for the upcoming summative examination.
Study Overview
Status
Conditions
Intervention / Treatment
Detailed Description
The AHEAD Trial (AI vs Human Exam Assessment and Development) is a single-center, participant-blinded randomized controlled trial conducted among first-year Doctor of Medicine (MD) students enrolled in the Foundations of Medical Practice I (MEDD 411) course at the University of British Columbia.
Participants were randomized in a 1:1 ratio to complete either an AI-generated or a human-generated mock final examination. Both exams consisted of 112 case-based single-best-answer multiple-choice questions (MCQs) aligned with the same MEDD 411 curricular objectives.
AI-generated questions were produced using a structured workflow involving ChatGPT for question generation and Google Gemini for independent verification. Human-generated questions were authored by senior medical students without AI assistance and underwent independent peer review. Both exams followed identical formatting guidelines and assessed the same learning objectives.
All participants completed identical pre-exam and post-exam surveys assessing demographic characteristics, familiarity with artificial intelligence in education, and perceptions of the examination experience. The study evaluates the utility of AI-generated assessments using van der Vleuten's Assessment Utility Framework, including feasibility, reliability, validity, acceptability, and educational impact.
The trial aims to determine whether large language models can accelerate the development of formative medical examinations while maintaining comparable psychometric quality and educational value relative to traditional human-authored questions.
Study Type
Enrollment (Actual)
Phase
- Not Applicable
Contacts and Locations
Study Locations
-
-
British Columbia
-
Vancouver, British Columbia, Canada, V5Z 1M9
- University of British Columbia Faculty of Medicine
-
-
Participation Criteria
Eligibility Criteria
Ages Eligible for Study
- Adult
- Older Adult
Accepts Healthy Volunteers
Description
Inclusion Criteria:
- Enrolled first-year medical students in the University of British Columbia MD undergraduate program.
- Can voluntarily consent to participate in the formative mock examination study.
Exclusion Criteria:
- Students who declined participation.
- Students who did not complete the mock examination or required surveys.
Study Plan
How is the study designed?
Design Details
- Primary Purpose: Other
- Allocation: Randomized
- Interventional Model: Parallel Assignment
- Masking: Single
Arms and Interventions
Participant Group / Arm |
Intervention / Treatment |
|---|---|
|
Experimental: AI-Generated MCQ Examination
Participants completed a 112-item case-based single-best-answer mock examination composed of AI-generated multiple-choice questions.
Questions were generated using a structured large language model workflow with ChatGPT-4 for generation and Google Gemini for independent validation.
|
A formative mock examination composed of 112 case-based multiple-choice questions generated using large language models aligned with course learning objectives.
|
|
Active Comparator: Human-Generated MCQ Examination
Participants completed a 112-item case-based single-best-answer mock examination composed of human-authored multiple-choice questions developed by senior medical students using traditional item-writing methods and peer review.
|
A formative mock examination composed of 112 case-based multiple-choice questions written by senior medical students using conventional item-writing methods aligned with the same course learning objectives.
|
What is the study measuring?
Primary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Student performance on the mock examination
Time Frame: Immediately after completion of the mock examination
|
Comparison of mean examination scores between students randomized to the AI-generated versus human-generated mock examinations.
|
Immediately after completion of the mock examination
|
Secondary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Item discrimination index
Time Frame: Immediately after the completion of the mock examination
|
Item-level discrimination index comparing AI-generated and human-generated multiple-choice questions, representing the difference in the proportion of correct responses between high-performing and low-performing students.
|
Immediately after the completion of the mock examination
|
|
Distractor efficiency
Time Frame: Immediately after the completion of the mock examination
|
Proportion of distractors selected by at least 5% of participants, comparing AI-generated and human-generated questions.
|
Immediately after the completion of the mock examination
|
|
Student-rated examination quality and acceptability
Time Frame: Immediately after completion of the mock examination
|
Student ratings of exam difficulty, clarity, relevance to course material, adequacy of time, multiple-choice question quality, understanding of clinical concepts, identification of knowledge gaps, retention for future clinical practice, and preparedness for the upcoming summative exam, measured immediately after exam completion using 10-point Likert scales (1 = lowest rating, 10 = highest rating).
For most domains, higher scores indicate greater endorsement of the construct being measured; for the difficulty item, higher scores indicate greater perceived difficulty.
|
Immediately after completion of the mock examination
|
|
Efficiency ratio of MCQ development time per matched learning objective
Time Frame: Baseline (prior to participant testing)
|
The outcome measuring the development efficiency of artificial intelligence (AI)-generated versus human-generated multiple-choice questions (MCQs).
The efficiency ratio was calculated as human-generated MCQ development time divided by AI-generated MCQ development time for matched learning objectives.
|
Baseline (prior to participant testing)
|
|
Change in perceived preparedness for the summative examination
Time Frame: Before and immediately after completion of the mock examination
|
Change from pre-exam to post-exam in self-rated preparedness for the upcoming summative examination, measured on a 10-point Likert scale (1 = not at all prepared; 10 = extremely prepared), with higher scores indicating greater perceived preparedness.
|
Before and immediately after completion of the mock examination
|
Collaborators and Investigators
Sponsor
Investigators
- Principal Investigator: Anita Palepu, MD, MPH, FRCPC, University of British Columbia
Study record dates
Study Major Dates
Study Start (Actual)
Primary Completion (Actual)
Study Completion (Actual)
Study Registration Dates
First Submitted
First Submitted That Met QC Criteria
First Posted (Actual)
Study Record Updates
Last Update Posted (Actual)
Last Update Submitted That Met QC Criteria
Last Verified
More Information
Terms related to this study
Keywords
Other Study ID Numbers
- Med_AHEAD_Trial
- H16-00044 (Other Identifier: University of British Columbia)
Plan for Individual participant data (IPD)
Plan to Share Individual Participant Data (IPD)?
IPD Plan Description
IPD Sharing Time Frame
IPD Sharing Access Criteria
IPD Sharing Supporting Information Type
- STUDY_PROTOCOL
- SAP
- ICF
- CSR
Drug and device information, study documents
Studies a U.S. FDA-regulated drug product
Studies a U.S. FDA-regulated device product
This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.
Clinical Trials on Medical Education Assessment
-
Liverpool John Moores UniversityUniversity of Central Lancashire; Queen's University, Belfast; Newcastle University and other collaboratorsCompletedMedical Education | Health Education | Educational AssessmentUnited Kingdom
-
Kutahya Health Sciences UniversityCompletedMedical Education | Artificial Intelligence | Assessment Methods in Medical TrainingTurkey (Türkiye)
-
Isfahan University of Medical SciencesNot yet recruitingEducation | Education, Medical | Education, Medical, Undergraduate
-
Isfahan University of Medical SciencesNot yet recruitingEducation | Education, Medical | Education, Medical, Undergraduate
-
University of KyreniaRecruiting
-
Harvard University Faculty of MedicineCompletedProblem-based Learning | Education, Medical | Problem Solving | Education, Medical, Undergraduate | Interactive TutorialUnited States
-
Heart Institute, Ministry of Health of UkrainePL Shupyk National Healthcare University of UkraineCompletedMedical Education | Simulation-based Learning | Anesthesiology Residency Training | Online Medical EducationUkraine
-
Sakarya UniversityRecruiting
-
Second Affiliated Hospital, School of Medicine,...RecruitingEducation, Medical | Venous Puncture | Students, MedicalChina
-
Agri Ibrahim Cecen UniversityCompletedMedical Education | Nursing Education | Patient SafetyTurkey (Türkiye)
Clinical Trials on AI-generated MCQ examination
-
Paris Translational Research Center for Organ TransplantationRecruiting
-
University of Auckland, New ZealandThe University of Auckland; Papatoetoe Family DoctorsCompletedAI (Artificial Intelligence) | Patient Satisfaction With AI in Doctors ConsultationsNew Zealand
-
Shandong UniversityUnknown
-
University of Social Sciences and Humanities, WarsawRecruitingRumination | Generalized Anxiety | Fear of FailurePoland
-
University of California, Los AngelesRecruitingOphthalmic Disease | Large Language Model | Artifical IntelligenceUnited States
-
University of Split, School of MedicineUniversity Hospital of SplitEnrolling by invitationNon-Metastatic Breast CancerCroatia
-
Wuhan Union Hospital, ChinaCompletedRadiology | Artificial Intellegence | Chest X-ray for Clinical Evaluation | Large Language ModelChina
-
Harvard Medical School (HMS and HSDM)Not yet recruitingArtificial Intelligence | Diagnostic Imaging | Health Literacy | Imaging Results
-
Antalya Training and Research HospitalRecruitingAnxiety | Allergic Rhinitis | Fear Anxiety | Asthma (Diagnosis)Turkey (Türkiye)
-
Moorfields Eye Hospital NHS Foundation TrustMedical Protection Society FoundationNot yet recruiting