- ICH GCP
- US Clinical Trials Registry
- Clinical Trial NCT04991987
Multicenter Validation Study of an Artificial Intelligence Tool for Automatic Classification of Chest X-rays
Study Overview
Status
Detailed Description
A current problem in Radiology Departments is the constant increase in the number of studies performed. This ever-increasing volume of information implies an increase in the time that medical specialists must dedicate to report these studies. The methodology carried out for reporting varies according to the imaging modality, which in high complexity centers includes radiology, computed tomography, magnetic resonance imaging and ultrasound, among others. Currently the largest volume of studies belongs to plain x-rays. At Hospital Italiano de Buenos Aires (HIBA) more than 220,000 x-rays were performed during 2019, and within this group more than 50% of the practices are chest x-rays, which are performed as a method of initial detection of potentially serious pathologies (pulmonary nodule, pneumonia, pneumothorax).
This imaging modality is not attractive and is not explored by the new generations of imaging specialists, who prefer to move towards more modern and complex methods such as computed tomography or magnetic resonance imaging. Therefore, the problem of the increasing volume of plain x-rays to be analyzed is intensified by the shortage of specialists with dedication and experience in their interpretation.
In the field of computer science, an area of study called Artificial Intelligence (AI) has emerged, which consists of a computer system that learns to perform specific routine tasks, and can complement or imitate human work. The developer must tell the AI system what response is desired from a given stimulus. An example of this is the spell checker in a word processor.
The field of AI encompasses a wide variety of sub-fields and specific techniques, such as Machine Learning (ML) or Deep Learning (DL). ML encompasses any tool in which computerized data is used to fit a model that draws conclusions from this input data. Algorithms are trained to learn given tasks based on a set of previously classified information. This also includes traditional techniques for creating predictive models or classification models. E-mail spam filtering is an example of ML. Neural networks are one of the tools included in ML.
Finally, DL is a type of ML that began to appear in 2015, which consists of adding layers to a traditional neural network and thus creating a nonlinear model with a higher degree of complexity since it increases the number of parameters to be adjusted. This network is exposed to a training dataset, which consists of already labeled information, and "learns" to label new information by mimicking the labeling criteria of the dataset. This learning is actually an iterative adjustment of the model parameters, which are iteratively modified according to the error between the original labeling and the labeling suggested by the network. Once the model is trained, its parameters are fixed and it can be used to infer labels of new information whose labeling is unknown. DL methods have been found to perform much better in data analysis than traditional methods. DL already has applications in everyday life, such as voice assistants in smart phones, or automatic face recognition and labeling in social networks.
DL applied to image processing is based on a method called convolutional neural networks. Its application has been investigated in the field of medical imaging, finding improvements in performance, from object detection (anatomical or pathological structures in radiological images) to segmentation tasks.
Since 2018, Hospital Italiano de Buenos Aires has been running the TRx program, which consists of the development of an AI-based tool to detect pathological findings in chest x-rays. The project is part of the Artificial Intelligence in Healthcare program of Hospital Italiano de Buenos Aires, and is carried out by a multidisciplinary team of professionals, including biomedical engineers, data scientists, radiologists, Clinical clinical informaticians, methodologists, and software engineers. TRx is a DL model, developed and validated at HIBA, which detects four types of radiological findings on chest x-rays: pulmonary opacities (nodules, masses, pneumonia, consolidations, ground glass, or atelectasis), pneumothorax, pleural effusions, and rib fractures. This detection is performed through four independent modules that are integrated into a single system. When processing an x-ray, TRx reports different types of results. First, the unified TRx system indicates dichotomously whether the image is suspicious for a pathological finding, or if it is possibly a normal chest x-ray. Secondly, each of the four modules indicates in particular whether a finding of pulmonary opacity, pneumothorax, pleural effusion, or rib fracture was detected, respectively. Finally, TRx enables the visualization of a heat map over the image indicating in color the region of the thorax where a suspected finding was detected.
The intended use of this tool is to assist non-imaging physicians in the diagnosis of chest x-rays by automatically detecting radiological findings. TRx version 1.0 (TRx v1) evaluates frontal chest x-rays of patients older than 14 years of age for four types of findings: pulmonary opacities, pleural effusion, fractures, and pneumothorax. The objective of this tool is to enhance the diagnostic performance of non-imaging physicians by providing assistance or a "preliminary report".
One fact that is stressed in AI is that models must be replicable; the model must give the same or better results if given the same input. Although this seems obvious, it is in contrast to humans, who commonly exhibit both inter and intra-observer variability. The standard of an AI model should at least match the human performance it will assist. Replicability depends on the problem, and the amount of variability depends on the specific task at hand.
There are authors who report that an AI model may present difficulties in providing accurate predictions when applied to new situations or populations (i.e., to which it was not exposed during training). Whereas radiologists are able to successfully adapt to differences in images (whether due to slice thickness, scanner marking, field strength, gradient intensity or contrast time) without affecting their interpretation of the images, AI generally lacks that ability. For example, if an AI agent was trained only with images from a 3 Tesla MRI scanner, it cannot be guaranteed a priori that it will have the same results on scans performed at 1.5 Tesla. One solution is to develop mathematical processes to recognize, normalize and transform the data to minimize drift. Another approach to mitigate this phenomenon is to perform training and validation with "full" data sets, representing each type of image data acquisition and reconstruction.
In order to evaluate the diagnostic performance of an AI tool in a comprehensive manner and thus ensure its intended use, it is recommended to perform multicenter studies, which allow measuring this performance in different patient populations and different image acquisition protocols. The present multicenter study seeks to externally validate the performance of an AI tool (TRx v.1) as a diagnostic assistance tool for chest x-rays.
Study Type
Enrollment (Anticipated)
Contacts and Locations
Study Locations
-
-
-
Buenos Aires, Argentina, 1199
- Hospital Italiano de Buenos Aires
-
-
Participation Criteria
Eligibility Criteria
Ages Eligible for Study
Accepts Healthy Volunteers
Genders Eligible for Study
Sampling Method
Study Population
Description
Inclusion Criteria:
X-rays that meet the following requirements will be included:
- Chest X-ray
- Belong to patients over 18 years of age.
- Advocacy and digital acquisition
- Study conducted in the aforementioned institutions and stored in their respective Picture Archiving and Communication System
Exclusion Criteria:
X-rays that are excluded:
- Poor technique (low contrast, veiled, off-center)
- Presence of abnormal position of the patient during acquisition.
Study Plan
How is the study designed?
Design Details
What is the study measuring?
Primary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Concordance between AI tool and reference standard
Time Frame: 5 months
|
The concordance between the category assigned by the professionals and that assigned by the algorithm will be analyzed.
For this purpose, a diagnostic test will be evaluated for the detection of abnormality (i.e., the test is positive when at least one of the four types of findings is observed).
Considering the specialists' diagnosis as a reference standard, the confusion matrix will be constructed and the diagnostic metrics of the AI tool (sensitivity, specificity and predictive values) will be calculated.
The 95% confidence intervals will be calculated using exact binomial distribution.
|
5 months
|
Secondary Outcome Measures
Outcome Measure |
Measure Description |
Time Frame |
|---|---|---|
|
Receiver Operating Characteristic curves
Time Frame: 5 months
|
Receiver Operating Characteristic curves will be constructed for the global category of abnormality and for each of the individual radiological findings, calculating in each case the Area Under the Curve (value between 0 and 1).
A model whose predictions are 100% incorrect has an area under the curve of 0.0; another whose predictions are 100% correct has an area under the curve of 1.0.
The categorization made by the expert radiologists will be taken as the reference standard.
It will be evaluated whether there is a significant difference between the area under the curve of the AI tool and the reference value estimated for non-imaging physicians (i.e.
emergency room physicians or residents).
The De Long test with a significance level of 0.01 will be used.
|
5 months
|
|
Qualitative analysis
Time Frame: 5 months
|
The images with erroneous diagnoses (false negatives and false positives) and the corresponding heat maps generated by the algorithm will be studied individually.
|
5 months
|
|
Inter-observer concordance index
Time Frame: 5 months
|
The inter-observer concordance between the participating specialists will be analyzed.
In cases where the image in question is categorized differently by each of the observers, they will be asked to review the images together to define a category.
|
5 months
|
|
Analysis by institution
Time Frame: 5 months
|
The variables of items 1. and 2. will be calculated separately for the images of each participating institution.
We will evaluate if there is a significant difference in the different area under the curve values across institutions using the De Long test.
A significance level of 0.01 will be used.
|
5 months
|
Collaborators and Investigators
Investigators
- Principal Investigator: Sonia E Benitez, MD, MSc, Hospital Italiano de Buenos Aires
Publications and helpful links
General Publications
- Calvert JS, Price DA, Chettipally UK, Barton CW, Feldman MD, Hoffman JL, Jay M, Das R. A computational approach to early sepsis detection. Comput Biol Med. 2016 Jul 1;74:69-73. doi: 10.1016/j.compbiomed.2016.05.003. Epub 2016 May 12.
- Kesselman A, Soroosh G, Mollura DJ; RAD-AID Conference Writing Group. 2015 RAD-AID Conference on International Radiology for Developing Countries: The Evolving Global Radiology Landscape. J Am Coll Radiol. 2016 Sep;13(9):1139-1144. doi: 10.1016/j.jacr.2016.03.028. Epub 2016 May 25.
- Chartrand G, Cheng PM, Vorontsov E, Drozdzal M, Turcotte S, Pal CJ, Kadoury S, Tang A. Deep Learning: A Primer for Radiologists. Radiographics. 2017 Nov-Dec;37(7):2113-2131. doi: 10.1148/rg.2017170077.
- Erickson BJ, Korfiatis P, Akkus Z, Kline TL. Machine Learning for Medical Imaging. Radiographics. 2017 Mar-Apr;37(2):505-515. doi: 10.1148/rg.2017160130. Epub 2017 Feb 17.
- Balthazar P, Harri P, Prater A, Safdar NM. Protecting Your Patients' Interests in the Era of Big Data, Artificial Intelligence, and Predictive Analytics. J Am Coll Radiol. 2018 Mar;15(3 Pt B):580-586. doi: 10.1016/j.jacr.2017.11.035. Epub 2018 Feb 6.
- Mosquera C, Diaz FN, Binder F, Rabellino JM, Benitez SE, Beresnak AD, Seehaus A, Ducrey G, Ocantos JA, Luna DR. Chest x-ray automated triage: A semiologic approach designed for clinical implementation, exploiting different types of labels through a combination of four Deep Learning architectures. Comput Methods Programs Biomed. 2021 Jul;206:106130. doi: 10.1016/j.cmpb.2021.106130. Epub 2021 May 2.
Helpful Links
- Weakly Supervised Learning of Deep Convolutional Neural Networks [Internet]. 2016 Institute of Electrical and Electronics Engineers, Conference on Computer Vision and Pattern Recognition. 2016.
- Guest Editorial Deep Learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique [Internet]. Vol. 35, Institute of Electrical and Electronics Engineers, Transactions on Medical Imaging. 2016. p. 1153-9.
- Dataset shift in machine learning. Neural Information Processing. 2008.
Study record dates
Study Major Dates
Study Start (Actual)
Primary Completion (Anticipated)
Study Completion (Anticipated)
Study Registration Dates
First Submitted
First Submitted That Met QC Criteria
First Posted (Actual)
Study Record Updates
Last Update Posted (Actual)
Last Update Submitted That Met QC Criteria
Last Verified
More Information
Terms related to this study
Additional Relevant MeSH Terms
Other Study ID Numbers
- 6025
Drug and device information, study documents
Studies a U.S. FDA-regulated drug product
Studies a U.S. FDA-regulated device product
This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.
Clinical Trials on Pneumothorax
-
Tabriz University of Medical SciencesAJA University of Medical SciencesUnknownPneumothorax, Spontaneous | Pneumothorax Spontaneous Primary | Pneumothorax, Recurrent | Pneumothorax Spontaneous TensionIran, Islamic Republic of
-
Peking University Third HospitalNot yet recruitingPrimary Spontaneous Pneumothorax
-
Rennes University HospitalRecruiting
-
Zealand University HospitalEnrolling by invitationPleural Diseases | Pleural Effusion | Pleural Infection | Pneumothorax Spontaneous Secondary | Pneumothorax Spontaneous PrimaryDenmark
-
Dow University of Health SciencesUnknownSecondary PneumothoraxPakistan
-
Chung Shan Medical UniversityCompletedLung Nodules | Pneumothorax Iatrogenic Postprocedural | Propensity Score MatchingTaiwan
-
University Hospital, Strasbourg, FranceRecruiting
-
Chinese University of Hong KongRecruitingPneumothorax, SpontaneousHong Kong
-
Ain Shams UniversityRecruitingPrimary and Secondary Spontaneous PneumothoraxEgypt
-
Johns Hopkins UniversityMedline Industries, IncCompletedPneumothorax | Tension PneumothoraxUnited States