Reporting guidelines for clinical trials of artificial intelligence interventions: the SPIRIT-AI and CONSORT-AI guidelines

Hussein Ibrahim, Xiaoxuan Liu, Samantha Cruz Rivera, David Moher, An-Wen Chan, Matthew R Sydes, Melanie J Calvert, Alastair K Denniston, Hussein Ibrahim, Xiaoxuan Liu, Samantha Cruz Rivera, David Moher, An-Wen Chan, Matthew R Sydes, Melanie J Calvert, Alastair K Denniston

Abstract

Background: The application of artificial intelligence (AI) in healthcare is an area of immense interest. The high profile of 'AI in health' means that there are unusually strong drivers to accelerate the introduction and implementation of innovative AI interventions, which may not be supported by the available evidence, and for which the usual systems of appraisal may not yet be sufficient.

Main text: We are beginning to see the emergence of randomised clinical trials evaluating AI interventions in real-world settings. It is imperative that these studies are conducted and reported to the highest standards to enable effective evaluation because they will potentially be a key part of the evidence that is used when deciding whether an AI intervention is sufficiently safe and effective to be approved and commissioned. Minimum reporting guidelines for clinical trial protocols and reports have been instrumental in improving the quality of clinical trials and promoting completeness and transparency of reporting for the evaluation of new health interventions. The current guidelines-SPIRIT and CONSORT-are suited to traditional health interventions but research has revealed that they do not adequately address potential sources of bias specific to AI systems. Examples of elements that require specific reporting include algorithm version and the procedure for acquiring input data. In response, the SPIRIT-AI and CONSORT-AI guidelines were developed by a multidisciplinary group of international experts using a consensus building methodological process. The extensions include a number of new items that should be reported in addition to the core items. Each item, where possible, was informed by challenges identified in existing studies of AI systems in health settings.

Conclusion: The SPIRIT-AI and CONSORT-AI guidelines provide the first international standards for clinical trials of AI systems. The guidelines are designed to ensure complete and transparent reporting of clinical trial protocols and reports involving AI interventions and have the potential to improve the quality of these clinical trials through improvements in their design and delivery. Their use will help to efficiently identify the safest and most effective AI interventions and commission them with confidence for the benefit of patients and the public.

Keywords: Artificial intelligence; Checklist; Clinical trials; Guidelines; Machine learning; Randomised controlled trials; Research design; Research report.

Conflict of interest statement

MJC is a National Institute for Health Research (NIHR) Senior Investigator and receives funding from the NIHR Birmingham Biomedical Research Centre, the NIHR Surgical Reconstruction and Microbiology Research Centre and NIHR ARC West Midlands at the University of Birmingham and University Hospitals Birmingham NHS Foundation Trust, Health Data Research UK, Innovate UK (part of UK Research and Innovation), Macmillan Cancer Support, UCB Pharma. The views expressed in this article are those of the author(s) and not necessarily those of the NIHR, or the Department of Health and Social Care. MJC has also received personal fees from Astellas, Takeda, Merck, Daiichi Sankyo, Glaukos, GSK, and the Patient-Centered Outcomes Research Institute (PCORI) outside the submitted work. DM is funded by a University Research Chair (uOttawa). All other authors declare that they have no competing interests.

References

    1. McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020;577(7788):89–94. doi: 10.1038/s41586-019-1799-6.
    1. Abramoff MD, Lou Y, Erginay A, Clarida W, Amelon R, Folk JC, et al. Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning. Investig Ophthalmol Vis Sci. 2016. 10.1167/iovs.16-19964.
    1. Bellemo V, Lim ZW, Lim G, Nguyen QD, Xie Y, Yip MYT, et al. Artificial intelligence using deep learning to screen for referable and vision-threatening diabetic retinopathy in Africa: a clinical validation study. Lancet Digital Health. 2019. 10.1016/S2589-7500(19)30004-4.
    1. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–118. doi: 10.1038/nature21056.
    1. Nagpal K, Foote D, Liu Y, Chen P-H, Wulczyn E, Tan F, et al. Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. NPJ Digit Med. 2019. 10.1038/s41746-019-0112-2.
    1. Huang S-C, Kothari T, Banerjee I, Chute C, Ball RL, Borus N, et al. PENet—a scalable deep-learning model for automated diagnosis of pulmonary embolism using volumetric CT imaging. NPJ Digit Med. 2020. 10.1038/s41746-020-0266-y.
    1. Yim J, Chopra R, Spitz T, Winkens J, Obika A, Kelly C, et al. Predicting conversion to wet age-related macular degeneration using deep learning. Nat Med. 2020. 10.1038/s41591-020-0867-7.
    1. Kim H, Goo JM, Lee KH, Kim YT, Park CM. Preoperative CT-based deep learning model for predicting disease-free survival in patients with lung adenocarcinomas. Radiology. 2020. 10.1148/radiol.2020192764.
    1. Wang P, Berzin TM, Glissen Brown JR, Bharadwaj S, Becq A, Xiao X, et al. Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study. Gut. 2019;68(10):1813–1819. doi: 10.1136/gutjnl-2018-317500.
    1. Tyler NS, Mosquera-Lopez CM, Wilson LM, Dodier RH, Branigan DL, Gabo VB, et al. An artificial intelligence decision support system for the management of type 1 diabetes. Nat Metab. 2020;2(7):612–9.
    1. Sibbald B, Roland M. Understanding controlled trials. Why are randomised controlled trials important? BMJ. 1998;316(7126):201. doi: 10.1136/bmj.316.7126.201.
    1. Juni P. Systematic reviews in health care: assessing the quality of controlled clinical trials. BMJ. 2001;323(7303):42–46. doi: 10.1136/bmj.323.7303.42.
    1. Chan A-W, Tetzlaff JM, Gøtzsche PC, Altman DG, Mann H, Berlin JA, et al. SPIRIT 2013 explanation and elaboration: guidance for protocols of clinical trials. BMJ. 2013;346:e7586. doi: 10.1136/bmj.e7586.
    1. Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340:c869. doi: 10.1136/bmj.c869.
    1. International Committee of Medical Journal Editors. Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals: Updated December 2019. . Accessed 25 Sept 2020.
    1. Moher D, Jones A, Lepage L, for the CONSORT Group. Use of the CONSORT statement and quality of reports of randomized trials. JAMA. 2001. 10.1001/jama.285.15.1992.
    1. Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019. 10.1016/S2589-7500(19)30123-2.
    1. Nagendran M, Chen Y, Lovejoy CA, Gordon AC, Komorowski M, Harvey H, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ. 2020;368:m689. doi: 10.1136/bmj.m689.
    1. CONSORT-AI and SPIRIT-AI Steering Group. Reporting guidelines for clinical trials evaluating artificial intelligence interventions are needed. Nat Med. 2019. 10.1038/s41591-019-0603-3.
    1. Cruz Rivera S, Liu X, Chan A-W, Denniston AK, Calvert MJ, The SPIRIT-AI and CONSORT-AI Working Group, et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat Med. 2020. 10.1038/s41591-020-1037-7.
    1. Cruz Rivera S, Liu X, Chan A-W, Denniston AK, Calvert MJ. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI Extension. BMJ. 2020. 10.1136/bmj.m3210.
    1. Cruz Rivera S, Liu X, Chan A-W, Denniston AK, Calvert MJ, The SPIRIT-AI and CONSORT-AI Working Group. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Lancet Digit Health. 2020. 10.1016/S2589-7500(20)30219-3.
    1. Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK, The SPIRIT-AI and CONSORT-AI Working Group. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med. 2020. 10.1038/s41591-020-1034-x.
    1. Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI Extension. BMJ. 2020. 10.1136/bmj.m3164.
    1. Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK, The SPIRIT-AI and CONSORT-AI Working Group. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Lancet Digit Health. 2020. 10.1016/S2589-7500(20)30218-1.
    1. EQUATOR Network. SPIRIT 2013 Statement. . Accessed 26 Aug 2020.
    1. EQUATOR Network. CONSORT 2010 Statement. . Accessed 26 Aug 2020.
    1. Sounderajah V, Ashrafian H, Aggarwal R, De Fauw J, Denniston AK, Greaves F, et al. Developing specific reporting guidelines for diagnostic accuracy studies assessing AI interventions: The STARD-AI Steering Group. Nat Med. 2020. 10.1038/s41591-020-0941-1.
    1. Collins GS, Moons KGM. Reporting of artificial intelligence prediction models. Lancet. 2019;393(10181):1577–1579. doi: 10.1016/S0140-6736(19)30037-6.

Source: PubMed

3
Abonner