These challenges adversely impact our progress with AI and prevent us from moving forward.
Four Barriers to Integrating AI into Radiology Practice
Machine learning methods for biomedical image analysis that are better and more precise than humans for specific tasks have been around for nearly two decades1. However, they are rarely used in current radiology practice. In fact, visually inspecting and subjectively describing an imaging study has not fundamentally changed since the very first scientific report of findings on an X-ray in 18962.
Despite the prowess of newer deep learning methods for recognizing patterns in images and other complex forms of data, the same fundamental barriers from the past two decades still stand in the way.
Here’s my take on four major barriers that are slowing the integration of AI into daily clinical radiology practice and how they are slowing radiological advances.
Barrier 1: Workflow integration
For any AI tool to be useful in a radiologist’s daily practice, it should provide a seamless experience. Integration into a hospital system’s PACS and dictation software is a necessary first step. Fortunately, commercial entities understand this, and tools are starting to be deployed centrally through AI marketplaces affiliated with various PACS. However, standardization and integration of these tools are still in early stages.
Clinical deployment of advanced biomedical image analysis research tools is also slowed because image preprocessing pipelines often require lengthy processing times and multiple quality control (QC) steps. These include the detection and correction of motion artifacts and misregistration that can potentially degrade downstream results.
While deep learning methods allow for rapid analysis of new data, the development of robust clinical systems requires that preprocessing and QC steps be automated — so that few or no manual QC steps are required. This becomes a larger problem when dealing with more complex 4D (e.g., multi-sequence MRI) or 5D data (e.g., multi-timepoint, multi-sequence MRI).
Barrier 2: Data variability and ability to generalize
Research in biomedical image processing has a historical focus on clean, homogeneous data sets — such as a 3D MPRAGE or FLAIR sequence used in a research protocol to investigate a particular disease. However, the performance of these datasets, which are typically acquired from a single institution, has been shown to overestimate performance in the real world3. This is particularly true when applied to data from other institutions4.
Deep learning methods have the potential to overcome issues of lower-quality clinical data and variability across imaging parameters and institutions, but they require larger sample sizes and more diverse, unbiased training data. Acquiring this data is no small task, especially given data-sharing privacy concerns and the time required for data labeling and annotation. To overcome this barrier, large and diverse training data is needed. Supplying that labeled data will take continued standardization efforts for data anonymization, data sharing, and data annotation. It will also require collaboration among academic centers, industry partners, and radiology advocacy groups.
Barrier 3: Disease diversity within and between individuals
In order to be successful, the majority of image processing tools and commercial radiology AI algorithms have been tailored to tackle a specific disease or abnormality, such as measuring calcium scores5, ejection fraction6, or multiple sclerosis plaques7. This “narrow” perspective of defining the specific tasks (or use cases) that are solvable by AI algorithms is a critical first step that is embraced by the Data Science Institute (DSI). General AI is still a long way off and there is debate over whether computers will ever be able to excel in general artificial intelligence, given the complexity of the human brain in analyzing problems.
While the narrow tasks being solved by AI provide decision support, they are far different from the reality of a radiologist’s overall job. To be useful, narrow AI solutions will have to be integrated into more comprehensive solutions. This must be done for every body part, multiplied by every modality. Ignoring the sheer diversity of human diseases is called the spectrum bias8. Take a brain MRI for example: there are several hundred possible different diseases or pathologies that could be present in any given scan. Typically, data-driven machine learning methods require large datasets (>100 examples) for each disease or task. This makes the task of identifying all of the rare diseases incredibly challenging, if not impossible, using data-driven methods that rely on highly annotated data.
Further complicating narrow AI approaches is the problem that more than one diseases or abnormality is often present in the same patient — such as when small vessel ischemic disease and chronic infarcts/insults are adjacent to a brain metastasis. If an AI tool can only address one (or even a few) types of abnormalities within a single patient, it will be of limited use for most imaging studies.
One potential solution to overcoming disease diversity and developing a more comprehensive solution is combining data-driven and domain-expert approaches. Ignoring the body of knowledge that we have already developed about the appearance of different diseases seems foolish, particularly for rare diseases in which adequate novel training examples will be hard to come by. Instead, we should try to incorporate the a priori knowledge we have about human disease in order to guide AI systems towards success.
Barrier 4: Clinical utility
Even if we overcome the first three barriers by developing an AI tool both robust in data variation and fully integrated into clinical workflow, it must fundamentally add value in order to be adopted into clinical practice and be worth paying for. Added value is necessary in three categories:
1. Speed: Does the tool reduce the time in which critical findings are reported and acted upon? Many early initiatives in AI, such as detecting hemorrhage in head CTs or large vessel occlusion on CTAs9, are based on this premise. However, only a relatively small percentage of studies require such rapid triage for these types of tools in order to add significant value.
2. Accuracy: Does the tool make the radiologist more diagnostically accurate? Even if a tool is precise in what it quantifies, it might be irrelevant diagnostically (i.e., addressing a problem that doesn’t need to be solved). For example, when all that matters for treatment is assessing whether the disease burden is worse, stable, or better, quantifying the change in disease burden down to the exact percentage point is unimportant. To actually improve diagnostic accuracy, an AI tool needs to actually enhance the radiologist’s sensitivity or diagnostic accuracy in a way that can clearly affect patient outcomes. This is very challenging to prove.
3. Efficiency: Does the tool reduce radiologists’ time spent reading studies? Many tools that add information will tend to slow down a radiologist. This is especially true if there is a specific tool for each task or disease that is encountered. In addition, many AI algorithms still result in false positives (think of traditional CAD for mammography) that slow radiologists down. However, some perceptual tasks, including those that require longitudinal comparisons such as comparing the size of lesions over time, could be made more efficient through AI tools with automated measurements/comparisons. Still, improving the speed at which radiologists can interpret the majority of imaging studies would require more comprehensive tools that are able to overcome Barrier 3. Such comprehensive tools could more easily increase efficiency with draft reports — given that full preliminary reports generated by residents and fellows have been shown to improve efficiency of academic neuroradiologists interpreting brain MRIs by 25%10.
To be integrated into radiology practice, any AI tool must overcome challenges to workflow integration — and address data variability and the ability to generalize that data, at a minimum. At least one aspect related to clinical utility will also need to be overcome, with many early tools addressing speed. However, these initial AI tools are still quite narrow, comprising a small portion of the diagnostic role of a radiologist, which also takes into account disease diversity within and between patients. In order for AI tools to supplement human radiologists in a more comprehensive fashion and be fully embraced, they will also need to be implemented across institutions and shown to increase accuracy and efficiency.
Although we are starting to remove the barriers for integrating biomedical image analysis and machine learning into radiology practice, many hurdles that have prevented clinical translation in the past 20 years still remain. Overcoming them will require significant effort and collaboration among academia, industry partners, and professional societies.
Jeff Rudie, MD, PhD | Fourth Year Research Track Radiology Resident and Informatics Fellow at the Hospital of the University of Pennsylvania | Incoming Neuroradiology Fellow at the University of California, San Francisco
1. C. Davatzikos, Machine learning in neuroimaging: Progress and challenges, Neuroimage (United States, 2018).
2. W. Robert Nitske, The Life of Wilhelm Conrad Rontgen, Discoverer of the X Ray (Tuscon: University of Arizona Press, 1971).
3. S. Do, R.G. Gonzalez, H. Lee, M.H. Lev, M. Mansouri, S.R. Pomeranz, and S. Yune, Real-World Performance of Deep-Learning-based Automated Detection System for Intracranial Hemorrhage. Educational Course presented at RSNA 2019, Chicago, IL.
4. J.R. Zech, M.A. Badgeley, M. Liu, A.B. Costa, J.J. Titano, and E.K. Oermann, Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study, PLOS Medicine, (United States; 2018, 15(11)).
5. Zebra Medical Vison 510k Clearance of Calcium Scoring on CT, https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm?ID=K172983
6. Arterys 510k Clearance of CardioAI for Ventricular Segmentation and Ejection Fraction Calculation, https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm?ID=K163253
7. Icometrix 510k Clearance of Icobrain for Multiple Sclerosis Lesion Segmentation, https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm?ID=K181939
8. S.H. Park SH, and K. Han, Methodologic Guide for Evaluating Clinical Performance and Effect of Artificial Intelligence Technology for Medical Diagnosis and Prediction, Radiology, (United States, 2018) 286(3):800-809.
9. J. Petrone, FDA approves stroke-detecting AI software, Nature Biotechnology. (United States, 2018) 36(4):290.
10. A. Al Yassin, M. Salehi Sadaghiani, S. Mohan, R.N. Bryan and I. Nasrallah, It Is About "Time": Academic Neuroradiologist Time Distribution for Interpreting Brain MRIs, Acadademic Radiology (United States, 2018) 25(12):1521-1525.