Evaluating the Effectiveness and Acceptability of a GPT-4o and RAG-Based Voice Chatbot for Depression Screening Using PHQ-9 (GPT4-RAG-PHQ)

January 24, 2025 updated by: University College, London

This study aims to assess the feasibility and acceptability of a voice-based chatbot, powered by GPT-4o and Retrieval-Augmented Generation (RAG), for conducting depression screening using the Patient Health Questionnaire-9 (PHQ-9). The PHQ-9 is a validated self-report instrument widely used to screen, diagnose, and monitor the severity of depression. It consists of nine questions that correspond to the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) criteria for major depressive disorder. Respondents rate the frequency of symptoms experienced over the past two weeks on a scale from 0 ("not at all") to 3 ("nearly every day"). The total score (ranging from 0 to 27) indicates the severity of depressive symptoms, categorized into minimal, mild, moderate, moderately severe, or severe depression. The PHQ-9 is also used to assess functional impairment and guide treatment decisions in clinical and research settings.

The voice-based chatbot integrates GPT-4o, with RAG to enhance its ability to provide informed and contextualized responses during interactions. GPT-4o serves as the conversational engine, capable of generating human-like, empathetic, and contextually appropriate dialogue. RAG, on the other hand, enables the chatbot to retrieve and incorporate external, up-to-date knowledge from a curated database or knowledge repository, ensuring the accuracy and reliability of its responses.

Study Overview

Status

Enrolling by invitation

Conditions

Intervention / Treatment

Procedure: GPT-4o and RAG Voice Chatbot for PHQ-9 Screening

Detailed Description

Depression is a prevalent mental health challenge with significant personal, social, and economic costs. Traditional mental health resources face barriers such as stigma, limited availability, and long wait times. Technology, particularly AI-powered tools, provides an opportunity to bridge these gaps. This study utilizes GPT-4o and RAG to create a voice-interactive chatbot capable of conversational engagement, administering the PHQ-9 questionnaire, and delivering personalized feedback.

Participants will fill in the PHQ-9 for self-testing before interacting with the chatbot (the results will not be disclosed to the public and will only be used for accuracy comparisons), and the results of their self-tests will be compared with the results given by the chatbot in terms of accuracy.

The chatbot interaction comprises three phases:

Warm-up conversations for rapport-building and general support.
- The chatbot initiates casual, empathetic dialogues to build rapport with users, helping them feel comfortable and at ease before transitioning to the PHQ-9 screening.
- Users can ask general questions related to mental health, and the chatbot provides informed and supportive responses.
Administration of the PHQ-9 questionnaire for depression screening.
- The chatbot introduces the PHQ-9 questionnaire, explaining its purpose and how the results will help assess the user's mental health.
- Through voice interaction, users respond to the nine PHQ-9 questions, and the chatbot records their responses. The chatbot can clarify questions or provide additional context if users have difficulty understanding specific items.
Analysis of results and delivery of tailored recommendations.
- After the user completes the PHQ-9, the chatbot analyzes the responses, calculates the total score, and categorizes the results into severity levels (e.g., mild, moderate).
- Based on the score, the chatbot provides personalized recommendations, such as self-help strategies for mild symptoms or suggesting professional mental health services for more severe cases.

Participants will interact with the chatbot and then participate in a 1-hour semi-structured interview to provide feedback on their experience. The study focuses on evaluating the acceptability and feasibility of using such LLM-based chatbots in mental health screening and identifying potential improvements and risks.

Study Objectives Primary Objectives

To evaluate the acceptability, feasibility, and accuracy of a GPT-4o and RAG-based voice chatbot (HopeBot) for depression screening using PHQ-9.
Hypothesis: Participants showed high acceptance of HopeBot (higher than 65%) and high willingness to use such LLM-based chatbot for mental health screening in the future (higher than 65%), indicating recognition of the credibility of LLM as a supportive tool in mental health screening (higher than 65%). Participants use of the HopeBot for depression screening matched their self-test PHQ-9 results by 100%
To analyze the chatbot's effectiveness in identifying depressive symptoms and delivering actionable recommendations.

Hypothesis: HopeBot can help users take the PHQ-9 test in a friendly way, help users categorize the answers accurately, and give accurate test results, the advice they provide is based on the official PHQ-9 guidelines, and more than 70% of the users say that their responses are very effective and helpful.

Secondary Objectives

To assess the feasibility and performance of integrating RAG with LLM in creating a voice-interactive chatbot for mental health.
Hypothesis: Over 65% of participants recognized that responses using RAG were more helpful and effective.
To explore the strengths, limitations, and risks of deploying LLMs in the mental health domain.

Hypothesis: More than 65% of users say that HopeBot is very convenient, more accessible, and cost-free to provide non-judgmental advice. However, 50% still expressed concerns about its privacy and data security.

Study Type

Observational

Enrollment (Estimated)

100

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

United Kingdom
- - London, United Kingdom, NW1 2DA
    - UCL Institute of Health Informatics

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

Adult
Older Adult

Accepts Healthy Volunteers

Yes

Sampling Method

Non-Probability Sample

Study Population

The study population consists of healthy individuals aged 18 to 65 years who do not have severe mental illnesses. We welcome participants who have an interest in artificial intelligence technologies. Recruitment will be conducted both online and offline.

Participants will engage with a sophisticated AI chatbot designed to simulate a consultation with a mental health professional. This interaction enables them to complete a psychometric evaluation, such as the PHQ-9 test, in a structured yet conversational manner. Unlike conventional online versions of the PHQ-9 test, the chatbot offers enhanced interactivity. For individuals who may experience difficulties in articulating or summarizing their responses, the chatbot provides clarifications and assistance using a comprehensive language model.

After the assessment, participants will receive an interpretation of their test results along with tailored advice, delivered in accordance with established mental health guidelines.

Description

Inclusion Criteria:

Adults aged 18-65 years.
Fluent in English.
Access to a device capable of voice interaction and stable internet connection.
Willing to participate in chatbot interaction and a follow-up interview.

Exclusion Criteria:

Current severe psychiatric diagnoses (e.g., psychosis, bipolar disorder).
Participants undergoing active treatment for depression with a psychiatrist.
Discomfort with voice-based technology or inability to provide informed consent.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

What is the study measuring?

Primary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Feasibility and Acceptability of the GPT-4o and RAG Voice Chatbot Time Frame: Interviews are conducted immediately following the chatbot interaction.	Participants' perceptions of the chatbot's feasibility, acceptability, and effectiveness are assessed through semi-structured interviews conducted after the interaction session. These interviews explore themes such as the chatbot's empathy, usability, and the overall user experience.	Interviews are conducted immediately following the chatbot interaction.

Secondary Outcome Measures

Outcome Measure	Measure Description	Time Frame
Accuracy of PHQ-9 Scoring by the Chatbot Time Frame: Measured immediately after the interaction session, once the chatbot has generated PHQ-9 scores	The accuracy of the PHQ-9 scores generated by the chatbot is measured by comparing the chatbot's results to participants' self-reported PHQ-9 scores collected prior to the interaction. Agreement will be assessed using statistical methods such as Cohen's kappa or intraclass correlation coefficient (ICC).	Measured immediately after the interaction session, once the chatbot has generated PHQ-9 scores

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Sponsor

University College, London

Investigators

Study Director: Kezhi Li, University College, London

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Estimated)

February 1, 2025

Primary Completion (Estimated)

March 31, 2025

Study Completion (Estimated)

May 31, 2025

Study Registration Dates

First Submitted

January 24, 2025

First Submitted That Met QC Criteria

January 24, 2025

First Posted (Actual)

March 25, 2025

Study Record Updates

Last Update Posted (Actual)

March 25, 2025

Last Update Submitted That Met QC Criteria

January 24, 2025

Last Verified

January 1, 2025

More Information

Terms related to this study

Keywords

Additional Relevant MeSH Terms

Other Study ID Numbers

26133.001 (Other Identifier: UCL Research Ethics Committee)

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

IPD Plan Description

IPD will not be shared to ensure participant confidentiality and comply with ethical guidelines related to mental health research. Additionally, participants did not provide explicit consent for data sharing with external researchers.

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Depression Anxiety Disorder

Sorlandet Hospital HF
University of Oslo; Karolinska Institutet; Australian Catholic University; Helse...

Recruiting

Group-based ACT for Psychological Distress of Young People (ACT-YOUNG)

Anxiety | Anxiety Depression | Depression Anxiety Disorder | Depression - Major Depressive Disorder

Norway
UCLH/UCL Joint Research Office
Medical Research Council; Camden and Islington NHS Trust; Central and North West...

Recruiting

CBT and the Neural Circuits of Anxiety

Anxiety Disorders | Anxiety | Anxiety Depression | CBT | Anxiety Disorders and Symptoms | Anxiety Generalized | Generalised Anxiety Disorder | Anxiety Disorder; Mixed With Depression (Mild) | Anxiety Disorder Generalized

United Kingdom
University Hospital, Strasbourg, France

Recruiting

Tolerance and Potential Synergistic Effect of the Combination of Intranasal Esketamine and Non-selective (MAOI) (ESKETAM)

Depression Anxiety Disorder

France
Fondazione Policlinico Universitario Agostino Gemelli...

Completed

Cinematherapy for Women With Depression and Anxiety (GINEMOTION)

Depression Anxiety Disorder

Italy
University of California, Irvine

Active, not recruiting

Health Questionnaire Study

Depression, Anxiety

United States
Universitat Jaume I
Hospital Universitari Mutua Terrassa

Not yet recruiting

Cost-effectiveness and Implementation of a Transdiagnostic Internet-based Intervention for Emotional Disorders in Community Care. (TREAT-ED)

Anxiety Disorders | Depression Anxiety Disorder | Emotional Disorders | Depression Disorders

Spain
Universitair Ziekenhuis Brussel

Enrolling by invitation

The Value of the Turkish Translation of the Four-Dimensional Symptom Questionnaire for Turkish-speaking Patients With Psychological Complaints and Language Barriers.

Depression Anxiety Disorder

Belgium
Mazra Mental Health Center
The Academic College of Tel-Aviv Yaffo

Unknown

Chinese Herb for the Treatment of Depression and Anxiety Disorders

Depression, Anxiety

Israel
Region Stockholm
Karolinska Institutet

Enrolling by invitation

Metacognitive Therapy for Depression and Generalized Anxiety Disorder in Primary Care, Blended Version of Mediated Treatment

Generalized Anxiety Disorder | Depression NOS

Sweden
Cairo University

Completed

The Association Between Physical Symptoms and Depression Among Medical Students in Egypt

Depression Anxiety Disorder | Physical Disorders

Egypt

Evaluating the Effectiveness and Acceptability of a GPT-4o and RAG-Based Voice Chatbot for Depression Screening Using PHQ-9 (GPT4-RAG-PHQ)

Study Overview

Status

Conditions

Intervention / Treatment

Detailed Description

Study Type

Enrollment (Estimated)

Contacts and Locations

Study Locations

Participation Criteria

Eligibility Criteria

Ages Eligible for Study

Accepts Healthy Volunteers

Sampling Method

Study Population

Description

Study Plan

How is the study designed?

Design Details

What is the study measuring?

Primary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Secondary Outcome Measures

Outcome Measure

Measure Description

Time Frame

Collaborators and Investigators

Sponsor

Investigators

Study record dates

Study Major Dates

Study Start (Estimated)

Primary Completion (Estimated)

Study Completion (Estimated)

Study Registration Dates

First Submitted

First Submitted That Met QC Criteria

First Posted (Actual)

Study Record Updates

Last Update Posted (Actual)

Last Update Submitted That Met QC Criteria

Last Verified

More Information

Terms related to this study

Keywords

Additional Relevant MeSH Terms

Other Study ID Numbers

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

IPD Plan Description

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

Studies a U.S. FDA-regulated device product

Clinical Trials on Depression Anxiety Disorder

Search Similar Trials

Sponsors and Collaborators

Medical Conditions

Drug Interventions

CROs by country

CROs in Algeria

Conditions

Rare Diseases

Drug Interventions

Dietary Supplements

Sponsor/Collaborators

Locations