Data Access in Healthcare — Implications for the AI Ecosystem

Don’t be misled by the headlines. Data privacy, access, and liquidity still present many challenges for healthcare AI development.

 

New computing technologies and the vast amounts of medical data generated now make machine learning (ML) and artificial intelligence (AI) algorithms feasible in ways that have been unachievable in the past. Lately, new AI algorithms are announced almost daily, which may lead us to conclude that access to the enormous data sets powering these innovations is no longer an issue in healthcare. Unfortunately, diverse data sets for medical algorithm training are still hard for developers to find.

Why data remains an issue
In medicine, the use of homogeneous data during the training of algorithms is known to overestimate performance of the algorithm in clinical practice settings. To overcome this limitation — and improve algorithm performance and optimize patient outcomes — ML algorithms require access to large, diverse, and unbiased training data sets ideally from multiple institutions.

When those medical data sets are more readily available, they will require efforts to curate, anonymize, annotate, and share data in a manner that is both HIPAA-compliant and respects patient concerns for data privacy. All of this must be accomplished in a manner that is ethical, compliant with institutional and governmental policies and regulations, and free of unintended bias.

There are many challenges related to data privacy, data access, and data liquidity:

• Ethical and legal issues regarding data ownership and data privacy
• Differing international standards for privacy and data sharing
• Managing conflicts of interest
• Limited interoperability for image data exchange
• Normalizing the data
• De-identifying the data
• Understanding intellectual property rights
• Addressing patient concerns about who has access to the data

Successful AI tools will require collaboration among both academic medical centers and radiology practices, along with industry partners, patient advocacy groups, and others. This collaborative approach creates a need to respect data ownership and privacy for patients and to protect intellectual property rights of the parties involved in co-creating new AI tools.

The ongoing need for healthcare data challenges radiologists
The data contained in imaging exams and reports is increasingly considered valuable from a financial perspective. Many groups are approached with offers to buy patient data for the purpose of creating commercial AI algorithms. The majority of medical equipment and devices now contain software/firmware embedded in the products themselves — which can collect patient data and annotations applied by radiologists directly to the images. Radiologists must recognize the value of the data they generate when working with industry partners to develop algorithms, or even when simply negotiating contracts for software and information systems in this emerging environment.

Radiologists must also be aware of (and prepared to address) patient concerns about data sharing. Patients are often willing to allow the use of their de-identified data for research purposes in hopes that it will improve patient care and provide potential benefits to others like themselves. At the same time, patients might be unwilling to give consent to share their data if they believe it will benefit a third party for profit venture. As a result, public education programs aimed at increasing awareness of the potential benefits of sharing de-identified data for this purpose might be beneficial.

These challenges create the need for new processes to manage patient consent. Governmental rules and regulations for protecting patient data — such as HIPAA in the U.S. and the GPDR in the European Union — vary greatly from country to country. Individual hospitals and medical centers might also have their own rules regarding what data must be kept on premises, what is and is not allowed to be shared with external parties, and with what types of external parties data can be shared.

In the U.S., HIPAA mandates that certain safeguards be put in place for protecting patient data. De-identification of imaging data presents unique challenges. In some cases, protected health information (PHI) might be burned directly into the images themselves. Additionally, image files contain not only the visible image pixel data but also a DICOM header. The DICOM standard defines hundreds of tags, including public and private proprietary tags — both of which may contain PHI or allow re-identification of patients. Awareness of these challenges and access to appropriate tools for scrubbing this PHI will be necessary before this data can be shared.

The path forward
“Garbage in, garbage out” applies. How do we ensure the high quality of data used to develop AI algorithms? Normalizing data for use in algorithm creation and validation — including the use of data standards and common data elements — will be an important part of this process. ACR Common™ and ACR Assist™ provide components that will facilitate an exchange of data. ACR Common is a collection of common radiology terms and semantic structures that leverages existing ontologies and can facilitate data normalization for data sharing and algorithm creation. ACR Assist is a clinical decision support framework that also utilizes structured classification and reporting taxonomies (such as ACR “RADS” tools: BI-RADS, Lung-RADS, TI-RADS, etc.)

As a specialty and within our individual practices, we need to overcome these barriers in order to create meaningful, patient-centered AI with a positive impact on outcomes. Enabling thoughtful data sharing is central to the development and implementation of patient-centered AI within radiology. To help radiologists and industry partners meet these challenges, our 2019 ACR Data Science Institute Summit: Data Access in Healthcare – Implications for the Artificial Intelligence Ecosystem will center on these key data access issues and describe how the ACR AI-LAB™ platform enables radiology professionals to participate in the use of health care AI at their own facilities, while keeping patient data on premises. The ACR DSI will host the summit as a preconference event before the Society for Imaging Informatics in Medicine Annual Meeting on June 25th in Denver. I encourage you to register today and look forward to seeing you there.

 

Amy Kotsenas, MD, FACR, Council Steering Committee Liaison to the DSI | Associate Professor, Mayo Clinic, Rochester, MN