Artificial Intelligence-powered Virtual Assistant for Emergency Triage in Neurology (AIDEN)

Phase 1 Trial of the Implementation of an Artificial Intelligence-powered Virtual Assistant for Emergency Triage in Neurology

This study examines the use of an AI-powered virtual assistant for quickly identifying and handling neurological emergencies, particularly in places with limited medical resources. The research aimed to check if this AI tool is safe and accurate enough to move on to more advanced testing stages. In a first-of-its-kind trial, the virtual assistant was tested with patients having urgent neurological issues. Neurologists first reviewed the AI's recommendations using clinical records and then assessed its performance directly with patients. The findings were as follows: neurologists agreed with the AI's decisions nearly all the time, and the AI outperformed earlier versions of Chat GPT in every tested aspect. Patients and doctors found the AI to be highly effective, rating it as excellent or very good in most cases. This suggests the AI could significantly enhance how quickly and accurately neurological emergencies are dealt with, although further trials are needed before it can be widely used.

Study Overview

Detailed Description

Background and Objectives: Neurological emergencies pose significant challenges in medical care, especially in resource-limited countries. Artificial Intelligence (AI), particularly health chatbots, offers a promising solution. However, rigorous validation is required to ensure safety and accuracy. The objective of our work is to evaluate the diagnostic accuracy and resolution effectiveness of an AI-powered virtual assistant designed for the triage of emergency neurological pathologies, to ensure the minimum standard of safety that allows for the progression to successive validation tests.

Methods: This Phase 1 trial evaluates the performance of an AI-powered virtual assistant for emergency neurological triage. Ten patients over 18 years old with urgent neurological pathologies were selected. In the first stage, nine neurologists assessed the safety of the virtual assistant using their clinical records. In the second part, the assistant's accuracy when used by patients was evaluated. Finally, its performance was compared with Chat GPT 3.5 and 4.

Study Type

Interventional

Enrollment (Actual)

10

Phase

  • Early Phase 1

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

      • Buenos Aires, Argentina, 1428
        • FLENI

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult
  • Older Adult

Accepts Healthy Volunteers

No

Description

Inclusion Criteria:

  • Patients over 18 years old consulting in the ER due to a neurological emergency

Exclusion Criteria:

  • Pregnancy

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

  • Primary Purpose: Diagnostic
  • Allocation: Non-Randomized
  • Interventional Model: Single Group Assignment
  • Masking: None (Open Label)

Arms and Interventions

Participant Group / Arm
Intervention / Treatment
Experimental: Virtual Assistant
Patients answer question with a virtual assistant about their recent visit to the ER.
Stage 1 focused on safety, using only medical information from clinical records for the virtual assistant. In Stage 2, which evaluated accuracy, participants interacted with the virtual assistant post-medical stabilization. Additionally, participants also provided initial symptom details for Chat-GPT input. Nine neurologists specializing in emergency participated in the study. In Stage 1, they assessed the virtual assistant's performance using clinical history information. In Stage 2, they analyzed the results from participant interactions with the assistant and performed a comparative evaluation of Chat-GPT. The virtual assistant functioned as a chatbot on WhatsApp and Telegram, using Spanish and incorporating advanced algorithms, decision trees, and large language models for interaction. For comparison, we utilized Chat-GPT versions 3.5 and 4, employing two prompt types in natural Spanish: one incorporating clinical record data and the other based on participant narratives.
Active Comparator: ChatGPT 3.5
Patients answer question with ChatGPT about their recent visit to the ER.
Stage 1 focused on safety, using only medical information from clinical records for the virtual assistant. In Stage 2, which evaluated accuracy, participants interacted with the virtual assistant post-medical stabilization. Additionally, participants also provided initial symptom details for Chat-GPT input. Nine neurologists specializing in emergency participated in the study. In Stage 1, they assessed the virtual assistant's performance using clinical history information. In Stage 2, they analyzed the results from participant interactions with the assistant and performed a comparative evaluation of Chat-GPT. The virtual assistant functioned as a chatbot on WhatsApp and Telegram, using Spanish and incorporating advanced algorithms, decision trees, and large language models for interaction. For comparison, we utilized Chat-GPT versions 3.5 and 4, employing two prompt types in natural Spanish: one incorporating clinical record data and the other based on participant narratives.
Active Comparator: ChatGPT 4
Patients answer question with ChatGPT about their recent visit to the ER.
Stage 1 focused on safety, using only medical information from clinical records for the virtual assistant. In Stage 2, which evaluated accuracy, participants interacted with the virtual assistant post-medical stabilization. Additionally, participants also provided initial symptom details for Chat-GPT input. Nine neurologists specializing in emergency participated in the study. In Stage 1, they assessed the virtual assistant's performance using clinical history information. In Stage 2, they analyzed the results from participant interactions with the assistant and performed a comparative evaluation of Chat-GPT. The virtual assistant functioned as a chatbot on WhatsApp and Telegram, using Spanish and incorporating advanced algorithms, decision trees, and large language models for interaction. For comparison, we utilized Chat-GPT versions 3.5 and 4, employing two prompt types in natural Spanish: one incorporating clinical record data and the other based on participant narratives.

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Diagnostic performance
Time Frame: The first interaction between participants and the virtual assistant occurred within less than a year after the event. Outcome measures were evaluated immediately after the interaction between patients and the virtual assistant.

Refers to the accuracy and effectiveness of medical tests or diagnostic tools in correctly identifying a disease or condition in patients.

Syndromic diagnosis agreement: evaluating neurologists considered a syndromic diagnosis accurate when AI tools could identify a condition based on a set of commonly coexisting signs and symptoms, rather than identifying a specific disease. This method is applied when the precise disease causing the symptoms is not immediately identifiable, allowing healthcare providers to effectively monitor and treat the patient's presenting symptoms.

Differential diagnosis agreement: a differential diagnosis was considered accurate when the differentials provided by each AI tool matched those presented by the participants.

The gold standard for diagnosis was considered to be the one given in the emergency department, unchanged over a one-month period.

The first interaction between participants and the virtual assistant occurred within less than a year after the event. Outcome measures were evaluated immediately after the interaction between patients and the virtual assistant.

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Appropriate medical conduct or recommendation
Time Frame: The first interaction between participants and the virtual assistant occurred within less than a year after the event. Outcome measures were evaluated immediately after the interaction between patients and the virtual assistant.

Case resolution was evaluated based on appropriate medical conduct or recommendation, categorizing 1) urgency as immediate, 2) short-term (within 48 hours), 3) or non-urgent.

The recommendations provided by each AI tool were assessed based on information gathered from clinical histories and input from participants.

The gold standard of appropriate medical conduct or recommendation was considered to be that given in the emergency department, with no changes over a period of one month.

The first interaction between participants and the virtual assistant occurred within less than a year after the event. Outcome measures were evaluated immediately after the interaction between patients and the virtual assistant.
Assessment of Usability and Satisfaction
Time Frame: The first interaction between participants and the virtual assistant occurred within less than a year after the event. Outcome measures were evaluated immediately after the interaction between patients and the virtual assistant.

Usability was measured by the time and number of questions needed for final diagnosis and resolution, both by neurologists and participants. For Chat GPT, we evaluated the time taken to draft the consultation reason.

A satisfaction scale from 1 to 5 was implemented, with 1 indicating a negative experience ("poor", potentially risky for the patient) and 5 highly positive ("excellent", potentially surpassing non-specialized human triage). A simple yes/no survey was also applied to participants, asking about the comprehensibility of the assistant's questions, the adequacy of referral according to urgency, and whether they considered the assistant could replace non-specialized triage or reduce emergency arrival time.

The first interaction between participants and the virtual assistant occurred within less than a year after the event. Outcome measures were evaluated immediately after the interaction between patients and the virtual assistant.

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Collaborators

Investigators

  • Principal Investigator: Mauricio F Farez, MD MPH, Fundación para la Lucha contra las Enfermedades Neurológicas de la Infancia

Publications and helpful links

The person responsible for entering information about the study voluntarily provides these publications. These may be about anything related to the study.

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

October 1, 2023

Primary Completion (Actual)

January 1, 2024

Study Completion (Actual)

January 1, 2024

Study Registration Dates

First Submitted

March 6, 2024

First Submitted That Met QC Criteria

March 21, 2024

First Posted (Actual)

March 28, 2024

Study Record Updates

Last Update Posted (Actual)

March 28, 2024

Last Update Submitted That Met QC Criteria

March 21, 2024

Last Verified

March 1, 2024

More Information

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Stroke

Clinical Trials on Virtual Assistant

3
Subscribe