AI vs Human Exam Assessment and Development (AHEAD Trial) (AHEAD)

March 16, 2026 updated by: Anita Palepu, University of British Columbia

Psychometric Performance and Student Perceptions of AI- Versus Human-Generated Multiple-Choice Question Development in Medical Education: The AHEAD Randomized Controlled Trial

The Artificial Intelligence (AI) vs Human Exam Assessment and Development (AHEAD) Trial is a participant-blinded randomized controlled trial conducted among first-year medical students at the University of British Columbia. The study evaluates whether multiple-choice examination questions generated using large language models (LLMs) perform comparably to traditionally human-written questions in medical education.

Participants were randomized to complete one of two versions of a formative mock final examination consisting of 112 case-based single-best-answer multiple-choice questions (MCQs) aligned with the same course learning objectives. One exam version contained AI-generated questions produced using a structured LLM workflow with independent AI verification, while the other contained questions authored by senior medical students using conventional methods.

The study evaluates exam feasibility, psychometric reliability, validity, student acceptability, and educational impact. Outcomes include exam performance, item discrimination indices, distractor efficiency, student perceptions of exam quality and difficulty, and changes in perceived preparedness for the upcoming summative examination.

Study Overview

Status

Completed

Conditions

Medical Education Assessment

Intervention / Treatment

Detailed Description

The AHEAD Trial (AI vs Human Exam Assessment and Development) is a single-center, participant-blinded randomized controlled trial conducted among first-year Doctor of Medicine (MD) students enrolled in the Foundations of Medical Practice I (MEDD 411) course at the University of British Columbia.

Participants were randomized in a 1:1 ratio to complete either an AI-generated or a human-generated mock final examination. Both exams consisted of 112 case-based single-best-answer multiple-choice questions (MCQs) aligned with the same MEDD 411 curricular objectives.

AI-generated questions were produced using a structured workflow involving ChatGPT for question generation and Google Gemini for independent verification. Human-generated questions were authored by senior medical students without AI assistance and underwent independent peer review. Both exams followed identical formatting guidelines and assessed the same learning objectives.

All participants completed identical pre-exam and post-exam surveys assessing demographic characteristics, familiarity with artificial intelligence in education, and perceptions of the examination experience. The study evaluates the utility of AI-generated assessments using van der Vleuten's Assessment Utility Framework, including feasibility, reliability, validity, acceptability, and educational impact.

The trial aims to determine whether large language models can accelerate the development of formative medical examinations while maintaining comparable psychometric quality and educational value relative to traditional human-authored questions.

Study Type

Interventional

Enrollment (Actual)

258

Phase

Not Applicable

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

Canada
- British Columbia
  - Vancouver, British Columbia, Canada, V5Z 1M9
    - University of British Columbia Faculty of Medicine

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

Adult
Older Adult

Accepts Healthy Volunteers

Yes

Description

Inclusion Criteria:

Enrolled first-year medical students in the University of British Columbia MD undergraduate program.
Can voluntarily consent to participate in the formative mock examination study.

Exclusion Criteria:

Students who declined participation.
Students who did not complete the mock examination or required surveys.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Primary Purpose: Other
Allocation: Randomized
Interventional Model: Parallel Assignment
Masking: Single

Number of Arms

Arms and Interventions

Participant Group / Arm	Intervention / Treatment
Experimental: AI-Generated MCQ Examination Participants completed a 112-item case-based single-best-answer mock examination composed of AI-generated multiple-choice questions. Questions were generated using a structured large language model workflow with ChatGPT-4 for generation and Google Gemini for independent validation.	Other: AI-generated MCQ examination A formative mock examination composed of 112 case-based multiple-choice questions generated using large language models aligned with course learning objectives.
Active Comparator: Human-Generated MCQ Examination Participants completed a 112-item case-based single-best-answer mock examination composed of human-authored multiple-choice questions developed by senior medical students using traditional item-writing methods and peer review.	Other: Human-generated MCQ examination A formative mock examination composed of 112 case-based multiple-choice questions written by senior medical students using conventional item-writing methods aligned with the same course learning objectives.

What is the study measuring?

Primary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Student performance on the mock examination Time Frame: Immediately after completion of the mock examination	Comparison of mean examination scores between students randomized to the AI-generated versus human-generated mock examinations.	Immediately after completion of the mock examination

Secondary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Item discrimination index Time Frame: Immediately after the completion of the mock examination	Item-level discrimination index comparing AI-generated and human-generated multiple-choice questions, representing the difference in the proportion of correct responses between high-performing and low-performing students.	Immediately after the completion of the mock examination
Distractor efficiency Time Frame: Immediately after the completion of the mock examination	Proportion of distractors selected by at least 5% of participants, comparing AI-generated and human-generated questions.	Immediately after the completion of the mock examination
Student-rated examination quality and acceptability Time Frame: Immediately after completion of the mock examination	Student ratings of exam difficulty, clarity, relevance to course material, adequacy of time, multiple-choice question quality, understanding of clinical concepts, identification of knowledge gaps, retention for future clinical practice, and preparedness for the upcoming summative exam, measured immediately after exam completion using 10-point Likert scales (1 = lowest rating, 10 = highest rating). For most domains, higher scores indicate greater endorsement of the construct being measured; for the difficulty item, higher scores indicate greater perceived difficulty.	Immediately after completion of the mock examination
Efficiency ratio of MCQ development time per matched learning objective Time Frame: Baseline (prior to participant testing)	The outcome measuring the development efficiency of artificial intelligence (AI)-generated versus human-generated multiple-choice questions (MCQs). The efficiency ratio was calculated as human-generated MCQ development time divided by AI-generated MCQ development time for matched learning objectives.	Baseline (prior to participant testing)
Change in perceived preparedness for the summative examination Time Frame: Before and immediately after completion of the mock examination	Change from pre-exam to post-exam in self-rated preparedness for the upcoming summative examination, measured on a 10-point Likert scale (1 = not at all prepared; 10 = extremely prepared), with higher scores indicating greater perceived preparedness.	Before and immediately after completion of the mock examination

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Sponsor

University of British Columbia

Investigators

Principal Investigator: Anita Palepu, MD, MPH, FRCPC, University of British Columbia

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

December 8, 2024

Primary Completion (Actual)

December 9, 2024

Study Completion (Actual)

December 9, 2024

Study Registration Dates

First Submitted

March 7, 2026

First Submitted That Met QC Criteria

March 16, 2026

First Posted (Actual)

March 18, 2026

Study Record Updates

Last Update Posted (Actual)

March 18, 2026

Last Update Submitted That Met QC Criteria

March 16, 2026

Last Verified

March 1, 2026

More Information

Terms related to this study

Keywords

Other Study ID Numbers

Med_AHEAD_Trial
H16-00044 (Other Identifier: University of British Columbia)

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

YES

IPD Plan Description

Deidentified participant-level examination scores and survey responses will be available upon reasonable request.

IPD Sharing Time Frame

Beginning 6 months after publication and ending 5 years after publication.

IPD Sharing Access Criteria

Data will be shared with researchers who provide a methodologically sound proposal. Requests should be directed to the corresponding author.

IPD Sharing Supporting Information Type

STUDY_PROTOCOL
SAP
ICF
CSR

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Medical Education Assessment

Liverpool John Moores University
University of Central Lancashire; Queen's University, Belfast; Newcastle University and other collaborators

Completed

Idea Density in Exam Performance (IDEP)

Medical Education | Health Education | Educational Assessment

United Kingdom
Kutahya Health Sciences University

Completed

Artificial Intelligence-Generated vs Academician-Developed Multiple True/False Questions in Anesthesiology Education

Medical Education | Artificial Intelligence | Assessment Methods in Medical Training

Turkey (Türkiye)
Philipp Fürnstahl

Not yet recruiting

Remote Guidance for eFAST Training Using a Head-Mounted Device (Remote-eFAST)

Medical Education | Ultrasonography | Clinical Competence | Point-of-care Ultrasound (POCUS) | Focused Assessment With Sonography for Trauma (FAST)

Switzerland
Isfahan University of Medical Sciences

Not yet recruiting

3D Eye Movement Simulator for Medical Education

Education | Education, Medical | Education, Medical, Undergraduate
Isfahan University of Medical Sciences

Not yet recruiting

Construction and Educational Impact of Plastination Models of the Limbic System, Basal Nuclei, Cerebellum, and Human Spinal Cord

Education | Education, Medical | Education, Medical, Undergraduate
University of Kyrenia

Recruiting

VR vs Classical Anatomy Teaching (VrISt)

Medical Education | Anatomy Education

Cyprus
Harvard University Faculty of Medicine

Completed

Comparing the Effect of Video-cases and Text-cases on Medical Students' Learning in Tutorial

Problem-based Learning | Education, Medical | Problem Solving | Education, Medical, Undergraduate | Interactive Tutorial

United States
Assiut University

Not yet recruiting

Prediction of Morbidity and Mortality With Medical Pre-Operative Fitness Assessment

Perioperative Medical Fitness Assessment
Sakarya University

Recruiting

Motivation in Learning Medical Terminology

Education | Medical Terminology Education

Turkey
Second Affiliated Hospital, School of Medicine,...

Recruiting

Video Demonstration and Video Feedback to Reduce Time to Perform Central Vein Cannulation in Junior Residents

Education, Medical | Venous Puncture | Students, Medical

China

Clinical Trials on AI-generated MCQ examination

Paris Translational Research Center for Organ Transplantation

Recruiting

Personalized Stories for Pediatric Kidney Recipients

Kidney Transplant

France
Dr. Mark Mulder

Not yet recruiting

Easy-to-use, Live, Interactive Support Avatar (ELISA)

Cancer
University of Auckland, New Zealand
The University of Auckland; Papatoetoe Family Doctors

Completed

Patient Satisfaction With AI Scribes

AI (Artificial Intelligence) | Patient Satisfaction With AI in Doctors Consultations

New Zealand
University of Social Sciences and Humanities, Warsaw

Recruiting

Optimization of Imagery Rescripting Research Using Generative Artificial Intelligence (FRBN)

Rumination | Generalized Anxiety | Fear of Failure

Poland
University of California, Los Angeles

Recruiting

Evaluating AI-Generated Plain Language Summaries on Patient Comprehension of Ophthalmology Notes Among English-Speaking Patients

Ophthalmic Disease | Large Language Model | Artifical Intelligence

United States
University of Split, School of Medicine
University Hospital of Split

Enrolling by invitation

Effectiveness of Original vs. AI-Generated Plain Language Summaries of Systematic Reviews

Non-Metastatic Breast Cancer

Croatia
Wuhan Union Hospital, China

Completed

Chest X-Ray Image Diagnosis and Report Generation Dedicated Model Based on Deepseek

Radiology | Artificial Intellegence | Chest X-ray for Clinical Evaluation | Large Language Model

China
Shandong University

Unknown

The Efficiency of Writing Endoscopic Reports by Artificial Intelligence and Physicians: a Randomized Controlled Trial

Gastrointestinal Disease
Harvard Medical School (HMS and HSDM)

Not yet recruiting

Multimodal Radiology Report to Improve Patient-centered Radiology

Artificial Intelligence | Diagnostic Imaging | Health Literacy | Imaging Results
Antalya Training and Research Hospital

Completed

Effect of AI(Artificial Intelligence)-Based Storytelling Video on Anxiety and Fear During Skin Prick Test in Children

Anxiety | Allergic Rhinitis | Fear Anxiety | Asthma (Diagnosis)

Turkey (Türkiye)

AI vs Human Exam Assessment and Development (AHEAD Trial) (AHEAD)

Psychometric Performance and Student Perceptions of AI- Versus Human-Generated Multiple-Choice Question Development in Medical Education: The AHEAD Randomized Controlled Trial

Study Overview

Status

Conditions

Intervention / Treatment

Detailed Description

Study Type

Enrollment (Actual)

Phase

Contacts and Locations

Study Locations

Participation Criteria

Eligibility Criteria

Ages Eligible for Study

Accepts Healthy Volunteers

Description

Study Plan

How is the study designed?

Design Details

Number of Arms

Arms and Interventions

Participant Group / Arm

Intervention / Treatment

What is the study measuring?

Primary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Secondary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Collaborators and Investigators

Sponsor

Investigators

Study record dates

Study Major Dates

Study Start (Actual)

Primary Completion (Actual)

Study Completion (Actual)

Study Registration Dates

First Submitted

First Submitted That Met QC Criteria

First Posted (Actual)

Study Record Updates

Last Update Posted (Actual)

Last Update Submitted That Met QC Criteria

Last Verified

More Information

Terms related to this study

Keywords

Other Study ID Numbers

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

IPD Plan Description

IPD Sharing Time Frame

IPD Sharing Access Criteria

IPD Sharing Supporting Information Type

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

Clinical Trials on Medical Education Assessment

Clinical Trials on AI-generated MCQ examination

Search Similar Trials

Sponsors and Collaborators

Medical Conditions

Drug Interventions

CROs by country

CROs in Uruguay

Conditions

Rare Diseases

Drug Interventions

Dietary Supplements

Sponsor/Collaborators

Locations