Efficiency of Verbal Intelligent Tutor Instruction in Neurosurgical Simulation

March 3, 2024 updated by: Rolando Del Maestro, McGill University

Efficiency of Verbal Intelligent Tutor Instruction in Neurosurgical Simulation: A Randomized Controlled Trial

At the Neurosurgical Simulation and Artificial Intelligence Learning Centre, we seek to provide surgical trainees with innovative technologies that allow them to improve their surgical technical skills in risk-free environments, potentially improving patient operative outcomes. The Intelligent Continuous Expertise Monitoring System (ICEMS), a deep learning application that assesses and trains neurosurgical technical skill and provides continuous intraoperative feedback, is one such technology that may improve surgical education.

In this randomized controlled trial, medical students from four Quebec universities will be blinded and randomized to one of three groups (one control and two experimental). Group 1 (control) will be provided with verbal AI tutor feedback based on the ICEMS error detection. Group 2 will be tutored by a human instructor who will receive ICEMS error data and deliver verbal instruction identical to that which the AI tutor delivers. Group 3 will be tutored by a human instructor who will be provided with ICEMS data but may deliver feedback as they feel is appropriate to correct the error.

The aim of this study is to determine how the method of delivery of verbal surgical error instruction influences trainee response to instruction and overall surgical performance. Evaluating trainee responses to AI instructor verbal feedback as compared to feedback from human instructors will allow for further development, testing, and optimization of the ICEMS and other AI tutoring systems.

Study Overview

Detailed Description

Background: Expert surgical technical skill is linked with improved patient outcomes; however, training novices to master these skills remains challenging. The Intelligent Continuous Expertise Monitoring System (ICEMS) is a deep learning application that was developed at the Neurosurgical Simulation and Artificial Intelligence Learning Centre to improve neurosurgical education. The ICEMS assesses and trains bimanual surgical performance by providing continuous feedback via verbal instructions in order to improve trainee performance and mitigate errors.

Rationale: A previous randomized controlled trial (RCT) performed at our centre demonstrated that intelligent tutoring is more effective than expert tutoring in a simulated neurosurgical procedure (NCT05168150). Another RCT revealed that medical students' performance in response to ICEMS instruction to decrease bipolar force application was variable (NCT04700384). An agglomerative clustering algorithm classified these variable student responses into 3 groups: 53% successfully obeyed the instruction to correct the error, 36% did not obey the instruction, and 11% over-responded to the instruction. This response variability could significantly limit the utility of the ICEMS and may be attributed to different learning styles, stress levels, or misinterpretation of AI instruction. During this study, expert trainers were not provided with ICEMS error data. Conducting a new RCT in which expert trainers are provided with ICEMS error data will clarify the reason many trainees did not respond to the AI instruction.

This report follows the Consolidated Standards of Reporting Trials-Artificial Intelligence (CONSORT-AI) as well as the Machine Learning to Assess Surgical Expertise (MLASE) checklist.

Hypotheses:

  1. Verbal AI feedback will yield significantly lower success response rates among trainees than identical error feedback provided by human instructors.
  2. Trainee performance assessment scores will be significantly higher in the two different human instruction groups assessed.
  3. Instruction delivered by the AI tutor will result in increased stress levels and cognitive load as compared to verbal error feedback delivered by human instructors.

Primary Objectives: To determine how the method of delivery of surgical error instruction influences:

  1. Trainee response to instruction, i.e., whether they corrected, did not correct, or over-corrected the error (data collected by the ICEMS).
  2. Trainee overall surgical performance (average expertise score on practice scenarios calculated by the ICEMS, Objective Structured Assessment of Technical Skills (OSATS) score on realistic scenario determined by two blinded expert raters).

Secondary Objective: To determine how the method of delivery of surgical error instruction influences trainee affective cognitive responses (self-reported via questionnaires on 5-point Likert scales).

Setting: McGill University's Neurosurgical Simulation and Artificial Intelligence Learning Centre.

Participants: Students enrolled in their preparatory, first, or second year at one of four Quebec medical schools.

Design: A three-arm randomized controlled trial.

Intervention: Participants will undergo a training session of approximately 90 minutes on the NeuroVR (CAE Healthcare), a virtual reality (VR) surgical simulator that simulates a subpial brain tumor resection. The NeuroVR has two possible scenarios: a simple practice scenario and a complex realistic scenario. Participants will perform six repetitions of the practice scenario (5 minutes each) followed by the realistic scenario (13 minutes). The ICEMS will continuously assess performance throughout the trial. All participants will receive verbal feedback when the ICEMS detects an error in their performance; however, the method of delivery of this verbal feedback will differ between groups.

  • Group 1 (control) will receive verbal feedback directly from the ICEMS when an error is detected.
  • Group 2 (experimental) will receive verbal feedback from an expert instructor delivered in the same words as the ICEMS.
  • Group 3 (experimental) will receive verbal feedback from an expert instructor delivered in their own words.

Verbal feedback will be based on the following six metrics:

  1. Tissue injury risk: When a trainee receives feedback on this metric, the healthy brain tissue has been damaged.
  2. Bleeding risk: When a trainee receives feedback on this metric, there is bleeding that must be cauterized.
  3. Instrument tip separation distance: Refers to the distance between the tip of the ultrasonic aspirator and the tips of the bipolar forceps. When a trainee receives feedback on this metric, their instruments are too far apart.
  4. High bipolar force: Refers to the amount of force applied to the tissue by the bipolar forceps. When a trainee receives feedback on this metric, they are applying too much force with the bipolar.
  5. Low bipolar force: Refers to the amount of force applied to the tissue by the bipolar forceps. When a trainee receives feedback on this metric, they are not applying enough force with the bipolar.
  6. High aspirator force: Refers to the amount of force applied to the tissue by the ultrasonic aspirator. When a trainee receives feedback on this metric, they are applying too much force with the aspirator.

These metrics will continuously be evaluated by the ICEMS. The ICEMS will only detect an error on one metric at a time according to a predetermined hierarchy (in the order listed above). For example, if a trainee makes an error on both bleeding risk (2) and high aspirator force (6) at the same time, the ICEMS will only detect an error for bleeding risk since this metric is above high aspirator force in the hierarchy.

The first practice scenario will serve as a baseline; thus, no feedback will be given. In the second, third, fourth, and fifth repetitions, feedback will be given according to ICEMS error detection. In the sixth repetition as well as the realistic scenario, no feedback will be provided.

Significance: With surgical education approaches beginning to shift towards competency-based frameworks, the implementation of effective AI educational feedback into surgical training becomes crucial for optimizing surgical learning. The results of this RCT will allow for the evaluation and reengineering of the ICEMS and other AI tutoring systems, which may advance the development of not only standardized competency-based surgical education training curricula, but any AI tutor technology dependent on verbal instruction.

Study Type

Interventional

Enrollment (Estimated)

78

Phase

  • Not Applicable

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Contact

Study Contact Backup

Study Locations

    • Quebec
      • Montréal, Quebec, Canada, H2X 4B3

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult
  • Older Adult

Accepts Healthy Volunteers

Yes

Description

Inclusion Criteria:

  • Medical students who are actively enrolled in medical school at any Quebec institution who do not fit the exclusion criteria.
  • Premedical students who are actively enrolled in medical school at any Quebec institution who do not fit the exclusion criteria.

Exclusion Criteria:

  • Prior use of the NeuroVR (CAE Healthcare) simulator.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

  • Primary Purpose: Health Services Research
  • Allocation: Randomized
  • Interventional Model: Parallel Assignment
  • Masking: Double

Arms and Interventions

Participant Group / Arm
Intervention / Treatment
No Intervention: Tutored by AI
26 participants allocated. During their second, third, fourth, and fifth repetition of the practice subpial brain tumor resection scenario, participants will receive verbal ICEMS feedback when the system detects an error on their performance.
Experimental: Tutored by human instructor using AI's words
26 participants allocated. During their second, third, fourth, and fifth repetition of the practice subpial brain tumor resection scenario, participants will receive verbal feedback from an expert instructor. The expert instructor will deliver this feedback using the same words as the ICEMS.
Expert instructor assigned to tutor this group will receive error detection data from the ICEMS. They will also be provided with a list of commands that the ICEMS uses. When the system detects an error in a student's performance for a given metric, the expert instructor must deliver this command in the same words as the ICEMS.
Experimental: Tutored by human instructor using wording of choice
26 participants allocated. During their second, third, fourth, and fifth repetition of the practice subpial brain tumor resection scenario, participants will receive verbal feedback from an expert instructor. The expert instructor will deliver this feedback using any wording they feel is appropriate to correct the error.
Expert instructor assigned to tutor this group will receive error detection data from the ICEMS. When the system detects an error in a student's performance for a given metric, the expert will deliver feedback in their own words.

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Response to instruction
Time Frame: 1 day of study
After a trainee receives verbal feedback on a specific metric, the ICEMS will record their response to this instruction, i.e., whether they corrected, did not correct, or over-corrected the error.
1 day of study
Average Intelligent Continuous Expertise Monitoring System (ICEMS) expertise score
Time Frame: 1 day of study
The ICEMS will continuously assess the trainee's performance and calculate an average expertise score between -1.00 (novice) and 1.00 (expert).
1 day of study
Objective Structured Assessment of Technical Skills (OSATS) global rating
Time Frame: Approximately 5 months after start of study
While performing the complex realistic scenario, participants will be video recorded. Two blinded expert raters will evaluate these videos using the OSATS global rating scale between 1 (novice) and 7 (expert).
Approximately 5 months after start of study

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Difference in the strength of emotions elicited
Time Frame: 1 day of study
Measured using Duffy's Medical Emotions Scale (MES) before, during, and after the intervention (self-reported via questionnaires on 5-point Likert scales).
1 day of study
Difference in cognitive load
Time Frame: 1 day of study
Measured using Leppink's Cognitive Load Index (CLI) after the intervention (self-reported via questionnaire on 5-point Likert scales).
1 day of study

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Publications and helpful links

The person responsible for entering information about the study voluntarily provides these publications. These may be about anything related to the study.

General Publications

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Estimated)

March 1, 2024

Primary Completion (Estimated)

October 1, 2024

Study Completion (Estimated)

December 1, 2024

Study Registration Dates

First Submitted

February 7, 2024

First Submitted That Met QC Criteria

February 16, 2024

First Posted (Actual)

February 22, 2024

Study Record Updates

Last Update Posted (Estimated)

March 5, 2024

Last Update Submitted That Met QC Criteria

March 3, 2024

Last Verified

March 1, 2024

More Information

Terms related to this study

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

YES

IPD Plan Description

Data obtained from primary and secondary outcomes may be shared if other researchers have an interest in this data.

IPD Sharing Time Frame

Data will be available for 5 years following the completion of the trial.

IPD Sharing Access Criteria

Researchers who wish to access the data must contact the principal investigator of the trial, Dr. Rolando F. Del Maestro.

IPD Sharing Supporting Information Type

  • STUDY_PROTOCOL
  • SAP
  • ICF
  • ANALYTIC_CODE
  • CSR

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Surgical Education

Clinical Trials on Experimental Group - Verbal expert instructor feedback in AI's words

3
Subscribe