The Effectiveness of Artificial Intelligence Conversational Agents in Health Care: Systematic Review

Madison Milne-Ives, Caroline de Cock, Ernest Lim, Melissa Harper Shehadeh, Nick de Pennington, Guy Mole, Eduardo Normando, Edward Meinert, Madison Milne-Ives, Caroline de Cock, Ernest Lim, Melissa Harper Shehadeh, Nick de Pennington, Guy Mole, Eduardo Normando, Edward Meinert

Abstract

Background: The high demand for health care services and the growing capability of artificial intelligence have led to the development of conversational agents designed to support a variety of health-related activities, including behavior change, treatment support, health monitoring, training, triage, and screening support. Automation of these tasks could free clinicians to focus on more complex work and increase the accessibility to health care services for the public. An overarching assessment of the acceptability, usability, and effectiveness of these agents in health care is needed to collate the evidence so that future development can target areas for improvement and potential for sustainable adoption.

Objective: This systematic review aims to assess the effectiveness and usability of conversational agents in health care and identify the elements that users like and dislike to inform future research and development of these agents.

Methods: PubMed, Medline (Ovid), EMBASE (Excerpta Medica dataBASE), CINAHL (Cumulative Index to Nursing and Allied Health Literature), Web of Science, and the Association for Computing Machinery Digital Library were systematically searched for articles published since 2008 that evaluated unconstrained natural language processing conversational agents used in health care. EndNote (version X9, Clarivate Analytics) reference management software was used for initial screening, and full-text screening was conducted by 1 reviewer. Data were extracted, and the risk of bias was assessed by one reviewer and validated by another.

Results: A total of 31 studies were selected and included a variety of conversational agents, including 14 chatbots (2 of which were voice chatbots), 6 embodied conversational agents (3 of which were interactive voice response calls, virtual patients, and speech recognition screening systems), 1 contextual question-answering agent, and 1 voice recognition triage system. Overall, the evidence reported was mostly positive or mixed. Usability and satisfaction performed well (27/30 and 26/31), and positive or mixed effectiveness was found in three-quarters of the studies (23/30). However, there were several limitations of the agents highlighted in specific qualitative feedback.

Conclusions: The studies generally reported positive or mixed evidence for the effectiveness, usability, and satisfactoriness of the conversational agents investigated, but qualitative user perceptions were more mixed. The quality of many of the studies was limited, and improved study design and reporting are necessary to more accurately evaluate the usefulness of the agents in health care and identify key areas for improvement. Further research should also analyze the cost-effectiveness, privacy, and security of the agents.

International registered report identifier (irrid): RR2-10.2196/16934.

Keywords: artificial intelligence; avatar; chatbot; conversational agent; digital health; intelligent assistant; speech recognition software; virtual assistant; virtual coach; virtual health care; virtual nursing; voice recognition software.

Conflict of interest statement

Conflicts of Interest: EL, NP, and GM are all employees of Ufonia Limited, a voice AI company. However, the paper was funded by the Sir David Cooksey Fellowship in Healthcare Translation at the University of Oxford, and Ufonia had no editorial influence on the final drafting. Their contribution was limited to feedback, given their applied voice AI expertise; therefore, no conflict of interest is identified.

©Madison Milne-Ives, Caroline de Cock, Ernest Lim, Melissa Harper Shehadeh, Nick de Pennington, Guy Mole, Eduardo Normando, Edward Meinert. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 22.10.2020.

Figures

Figure 1
Figure 1
Preferred Reporting Items for Systematic Review and Meta-Analyses flow diagram. NLP: natural language processing.
Figure 2
Figure 2
Risk of bias summary: review authors' judgements about each risk of bias item for each included study.
Figure 3
Figure 3
Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

References

    1. Bibault J, Chaix B, Nectoux P, Pienkowsky A, Guillemasse A, Brouard B. Healthcare ex Machina: are conversational agents ready for prime time in oncology? Clin Transl Radiat Oncol. 2019 May;16:55–9. doi: 10.1016/j.ctro.2019.04.002.
    1. Laranjo L, Dunn AG, Tong HL, Kocaballi AB, Chen J, Bashir R, Surian D, Gallego B, Magrabi F, Lau AY, Coiera E. Conversational agents in healthcare: a systematic review. J Am Med Inform Assoc. 2018 Sep 1;25(9):1248–58. doi: 10.1093/jamia/ocy072.
    1. Luxton DD. Ethical implications of conversational agents in global public health. Bull World Health Organ. 2020 Apr 1;98(4):285–7. doi: 10.2471/BLT.19.237636.
    1. Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019 Jun;6(2):94–8. doi: 10.7861/futurehosp.6-2-94.
    1. Montenegro JL, da Costa CA, da Rosa Righi R. Survey of conversational agents in health. Expert Syst Appl. 2019 Sep;129:56–67. doi: 10.1016/j.eswa.2019.03.054.
    1. Weizenbaum J. ELIZA — a computer program for the study of natural language communication between man and machine. Commun ACM. 1983 Jan;26(1):23–8. doi: 10.1145/357980.357991.
    1. Campillos-Llanos L, Thomas C, Bilinski ?, Zweigenbaum P, Rosset S. Designing a virtual patient dialogue system based on terminology-rich resources: challenges and evaluation. Nat Lang Eng. 2019 Jul 15;:1–38. doi: 10.1017/s1351324919000329.
    1. Chang P, Sheng Y, Sang Y, Wang D. Developing a wireless speech- and touch-based intelligent comprehensive triage support system. Comput Inform Nurs. 2008;26(1):31–8. doi: 10.1097/01.NCN.0000304754.49116.b4.
    1. Adams WG, Phillips BD, Bacic JD, Walsh KE, Shanahan CW, Paasche-Orlow MK. Automated conversation system before pediatric primary care visits: a randomized trial. Pediatrics. 2014 Sep;134(3):e691–9. doi: 10.1542/peds.2013-3759.
    1. Kocaballi AB, Berkovsky S, Quiroz JC, Laranjo L, Tong HL, Rezazadegan D, Briatore A, Coiera E. The personalization of conversational agents in health care: systematic review. J Med Internet Res. 2019 Nov 7;21(11):e15360. doi: 10.2196/15360.
    1. Sun R, Aldunate R, Ratnam R, Jain S, Morrow D, Sosnoff J. Validity and usability of an automated fall risk assessment tool for older adults internet. Innov Aging. 2018:362. doi: 10.1093/geroni/igy023.1338.
    1. Nakagawa S, Enomoto D, Yonekura S, Kanazawa H, Kuniyoshi Y. A Telecare System that Estimates Quality of Life through Communication. International Conference on Cloud Computing and Intelligence Systems; CCIS'18; November 23-25, 2018; Nanjing, China. 2018.
    1. Fitzpatrick KK, Darcy A, Vierhile M. Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (WOEBOT): a randomized controlled trial. JMIR Ment Health. 2017 Jun 6;4(2):e19. doi: 10.2196/mental.7785.
    1. Håvik R, Wake J, Flobak E, Lundervold A, Guribye F. A conversational interface for self-screening for ADHD in adults. Internet Sci. 2019:144. doi: 10.1007/978-3-030-17705-8_12.
    1. Isaza-Restrepo A, Gómez MT, Cifuentes G, Argüello A. The virtual patient as a learning tool: a mixed quantitative qualitative study. BMC Med Educ. 2018 Dec 6;18(1):297. doi: 10.1186/s12909-018-1395-8.
    1. van Heerden A, Ntinga X, Vilakazi K. The Potential of Conversational Agents to Provide a Rapid HIV Counseling and Testing Services. International Conference on the Frontiers and Advances in Data Science; FADS'17; October 23-25, 2017; Xi'an, China. 2017.
    1. Bickmore TW, Pfeifer LM, Byron D, Forsythe S, Henault LE, Jack BW, Silliman R, Paasche-Orlow MK. Usability of conversational agents by patients with inadequate health literacy: evidence from two clinical trials. J Health Commun. 2010;15(Suppl 2):197–210. doi: 10.1080/10810730.2010.499991.
    1. Zhang Z, Bickmore T. Medical Shared Decision Making with a Virtual Agent. Proceedings of the 18th International Conference on Intelligent Virtual Agents; IVA'18; November 5-8, 2018; Sydney, NSW, Australia,. 2018.
    1. Vaidyam AN, Wisniewski H, Halamka JD, Kashavan MS, Torous JB. Chatbots and conversational agents in mental health: a review of the psychiatric landscape. Can J Psychiatry. 2019 Jul;64(7):456–64. doi: 10.1177/0706743719828977.
    1. Russo A, D'Onofrio G, Gangemi A, Giuliani F, Mongiovi M, Ricciardi F, Greco F, Cavallo F, Dario P, Sancarlo D, Presutti V, Greco A. Dialogue systems and conversational agents for patients with dementia: the human-robot interaction. Rejuvenation Res. 2019 Apr;22(2):109–20. doi: 10.1089/rej.2018.2075.
    1. Xing Z, Yu F, Qanir YA, Guan T, Walker J, Song L. Intelligent conversational agents in patient self-management: a systematic survey using multi data sources. Stud Health Technol Inform. 2019 Aug 21;264:1813–4. doi: 10.3233/SHTI190661.
    1. Provoost S, Lau HM, Ruwaard J, Riper H. Embodied conversational agents in clinical psychology: a scoping review. J Med Internet Res. 2017 May 9;19(5):e151. doi: 10.2196/jmir.6553.
    1. Safi S, Thiessen T, Schmailzl KJ. Acceptance and resistance of new digital technologies in medicine: qualitative study. JMIR Res Protoc. 2018 Dec 4;7(12):e11072. doi: 10.2196/11072.
    1. de Cock C, Milne-Ives M, van Velthoven MH, Alturkistani A, Lam C, Meinert E. Effectiveness of conversational agents (virtual assistants) in health care: protocol for a systematic review. JMIR Res Protoc. 2020 Mar 9;9(3):e16934. doi: 10.2196/16934.
    1. Schardt C, Adams MB, Owens T, Keitz S, Fontelo P. Utilization of the PICO framework to improve searching PubMed for clinical questions. BMC Med Inform Decis Mak. 2007 Jun 15;7:16. doi: 10.1186/1472-6947-7-16.
    1. Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA, PRISMA-P Group Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. Br Med J. 2015 Jan 2;350:g7647. doi: 10.1136/bmj.g7647.
    1. Higgins J. Cochrane Handbook for Systematic Reviews of Interventions. 2019. ISBN. 2019:9781119536628.
    1. Higgins JP, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, Savovic J, Schulz KF, Weeks L, Sterne JAC, Cochrane Bias Methods Group. Cochrane Statistical Methods Group The cochrane collaboration's tool for assessing risk of bias in randomised trials. Br Med J. 2011 Oct 18;343:d5928. doi: 10.1136/bmj.d5928.
    1. CASP Checklists. Critical Appraisal Skills Programme: CASP. [2020-09-11].
    1. Downes MJ, Brennan ML, Williams HC, Dean RS. Development of a critical appraisal tool to assess the quality of cross-sectional studies (AXIS) BMJ Open. 2016 Dec 8;6(12):e011458. doi: 10.1136/bmjopen-2016-011458.
    1. Christopoulou SC, Kotsilieris T, Anagnostopoulos I. Assessment of health information technology interventions in evidence-based medicine: a systematic review by adopting a methodological evaluation framework. Healthcare (Basel) 2018 Aug 31;6(3):-. doi: 10.3390/healthcare6030109.
    1. Cameron G, Cameron D, Megaw G, Bond R, Mulvenna M, O?Neill S, Armour C, McTear M. Assessing the Usability of a Chatbot for Mental Health Care. In: Bodrunova S. Internet Science., editor. Lecture Notes in Computer Science, vol 11551 Springer, Cham; 2019.
    1. Elmasri D, Maeder A. A Conversational Agent for an Online Mental Health Intervention Internet. Brain Informatics and Health.? 2016:251. doi: 10.1007/978-3-319-47103-7_24.
    1. Fulmer R, Joerin A, Gentile B, Lakerink L, Rauws M. A conversational agent for an online mental health intervention internetusing psychological artificial intelligence (Tess) to relieve symptoms of depression and anxiety: randomized controlled trial. JMIR Ment Health. 2018 Dec 13;5(4):e64. doi: 10.2196/mental.9782.
    1. Hudlicka E. Virtual training and coaching of health behavior: example from mindfulness meditation training. Patient Educ Couns. 2013 Aug;92(2):160–6. doi: 10.1016/j.pec.2013.05.007.
    1. Inkster B, Sarda S, Subramanian V. An empathy-driven, conversational artificial intelligence agent (Wysa) for digital mental well-being: real-world data evaluation mixed-methods study. JMIR Mhealth Uhealth. 2018 Nov 23;6(11):e12106. doi: 10.2196/12106.
    1. Ly KH, Ly A, Andersson G. A fully automated conversational agent for promoting mental well-being: a pilot RCT using mixed methods. Internet Interv. 2017 Dec;10:39–46. doi: 10.1016/j.invent.2017.10.002.
    1. Philip P, Micoulaud-Franchi J, Sagaspe P, Sevin ED, Olive J, Bioulac S, Sauteraud A. Virtual human as a new diagnostic tool, a proof of concept study in the field of major depressive disorders. Sci Rep. 2017 Feb 16;7:42656. doi: 10.1038/srep42656. doi: 10.1038/srep42656.
    1. Yasavur U, Lisetti C, Rishe N. Let’s talk! speaking virtual counselor offers you a brief intervention. J Multimodal User Interfaces. 2014 Sep 5;8(4):381–98. doi: 10.1007/s12193-014-0169-9.
    1. Xu R, Mei G, Zhang G, Gao P, Judkins T, Cannizzaro M, Li J. A voice-based automated system for PTSD screening and monitoring. Stud Health Technol Inform. 2012;173:552–8.
    1. Washburn M, Bordnick P, Rizzo AS. A pilot feasibility study of virtual patient simulation to enhance social work students' brief mental health assessment skills. Soc Work Health Care. 2016 Oct;55(9):675–93. doi: 10.1080/00981389.2016.1210715.
    1. Dimeff LA, Jobes DA, Chalker SA, Piehl BM, Duvivier LL, Lok BC, Zalake MS, Chung J, Koerner K. A novel engagement of suicidality in the emergency department: virtual collaborative assessment and management of suicidality. Gen Hosp Psychiatry. 2020;63:119–26. doi: 10.1016/j.genhosppsych.2018.05.005.
    1. Spänig S, Emberger-Klein A, Sowa J, Canbay A, Menrad K, Heider D. The virtual doctor: an interactive clinical-decision-support system based on deep learning for non-invasive prediction of diabetes. Artif Intell Med. 2019 Sep;100:101706. doi: 10.1016/j.artmed.2019.101706.
    1. Ghosh S, Bhatia S, Bhatia A. Quro: facilitating user symptom check using a personalised chatbot-oriented dialogue system. Stud Health Technol Inform. 2018;252:51–6.
    1. Chaix B, Bibault J, Pienkowski A, Delamon G, Guillemassé A, Nectoux P, Brouard B. When chatbots meet patients: one-year prospective study of conversations between patients with breast cancer and a chatbot. JMIR Cancer. 2019 May 2;5(1):e12856. doi: 10.2196/12856.
    1. Bibault J, Chaix B, Guillemassé A, Cousin S, Escande A, Perrin M, Pienkowski A, Delamon G, Nectoux P, Brouard B. A chatbot versus physicians to provide information for patients with breast cancer: blind, randomized controlled noninferiority trial. J Med Internet Res. 2019 Nov 27;21(11):e15787. doi: 10.2196/15787.
    1. Heyworth L, Kleinman K, Oddleifson S, Bernstein L, Frampton J, Lehrer M, Salvato K, Weiss TW, Simon SR, Connelly M. Comparison of interactive voice response, patient mailing, and mailed registry to encourage screening for osteoporosis: a randomized controlled trial. Osteoporos Int. 2014 May;25(5):1519–26. doi: 10.1007/s00198-014-2629-1.
    1. Rhee H, Allen J, Mammen J, Swift M. Mobile phone-based asthma self-management aid for adolescents (mASMAA): a feasibility study. Patient Prefer Adherence. 2014;8:63–72. doi: 10.2147/PPA.S53504. doi: 10.2147/PPA.S53504.
    1. Simon SR, Zhang F, Soumerai SB, Ensroth A, Bernstein L, Fletcher RH, Ross-Degnan D. Failure of automated telephone outreach with speech recognition to improve colorectal cancer screening: a randomized controlled trial. Arch Intern Med. 2010 Feb 8;170(3):264–70. doi: 10.1001/archinternmed.2009.522.
    1. Borja-Hart NL, Spivey CA, George CM. Use of virtual patient software to assess student confidence and ability in communication skills and virtual patient impression: a mixed-methods approach. Curr Pharm Teach Learn. 2019 Jul;11(7):710–8. doi: 10.1016/j.cptl.2019.03.009.
    1. Philip P, Bioulac S, Sauteraud A, Chaufton C, Olive J. Could a virtual human be used to explore excessive daytime sleepiness in patients? Presence. 2014 Nov 1;23(4):369–76. doi: 10.1162/pres_a_00197.
    1. Galescu L, Allen J, Ferguson G, Quinn J, Swift M. Speech Recognition in a Dialog System for Patient Health Monitoring. International Conference on Bioinformatics and Biomedicine Workshop; BIBMW'09; November 1-4, 2009; Washington, DC. 2009.
    1. Friederichs S, Bolman C, Oenema A, Guyaux J, Lechner L. Motivational interviewing in a web-based physical activity intervention with an avatar: randomized controlled trial. J Med Internet Res. 2014 Feb 13;16(2):e48. doi: 10.2196/jmir.2974.
    1. Crutzen R, Peters GY, Portugal SD, Fisser EM, Grolleman JJ. An artificially intelligent chat agent that answers adolescents' questions related to sex, drugs, and alcohol: an exploratory study. J Adolesc Health. 2011 May;48(5):514–9. doi: 10.1016/j.jadohealth.2010.09.002.
    1. Wong W, Thangarajah J, Padgham L. Contextual question answering for the health domain. J Am Soc Inf Sci Tec. 2012 Oct 30;63(11):2313–27. doi: 10.1002/asi.22733.
    1. Ireland D, Atay C, Liddle J, Bradford D, Lee H, Rushin O, Mullins T, Angus D, Wiles J, McBride S, Vogel A. Hello Harlie: enabling speech monitoring through chat-bot conversations. Stud Health Technol Inform. 2016;227:55–60.
    1. Copenhagen: The Nordic Cochrane Centre. RevMan. [2020-09-11]. .
    1. Greenhalgh T, Wherton J, Papoutsi C, Lynch J, Hughes G, A'Court C, Hinder S, Fahy N, Procter R, Shaw S. Beyond adoption: a new framework for theorizing and evaluating nonadoption, abandonment, and challenges to the scale-up, spread, and sustainability of health and care technologies. J Med Internet Res. 2017 Nov 1;19(11):e367. doi: 10.2196/jmir.8775.
    1. Michie S, van Stralen Maartje M, West R. The behaviour change wheel: a new method for characterising and designing behaviour change interventions. Implement Sci. 2011 Apr 23;6:42. doi: 10.1186/1748-5908-6-42.
    1. Meinert E, Alturkistani A, Brindley D, Knight P, Wells G, Pennington ND. The technological imperative for value-based health care. Br J Hosp Med (Lond) 2018 Jun 2;79(6):328–32. doi: 10.12968/hmed.2018.79.6.328.

Source: PubMed

3
Abonnieren