Privacy and protection remain the priorities when data is concerned. As speakers debate over the topic at the recent AIMed Breakfast Briefing – Experience the future of AI in Radiology, hosted in partnership with SOPHiA GENETICS in Boston. Sara Gerke, research fellow of medicine, AI and law at the Petrie-Flom Center for Health Law Policy, Biotechnology and Bioethics, Harvard Law School, said there is no universal data protection regulation at the moment. 

In the European Union, GDPR (General Data Protection Regulation) does safeguard all personal data including those that concern health. While in the US, HIPAA (Health Insurance Portability and Accountability Act of 1996) covers most health information that is generated by a covered entity. However, as Gerke pointed out, HIPAA does not protect data coming from wearables and businesses that fall outside of the agreement. As such, Gerke believes it makes sense to create more ways to protect data since we have more avenues to generate and access data presently. 

The hidden risks

Neil Teneholtz, Director of Machine Learning, MGH & BWH Center for Clinical Data Science agreed, he said from the medical research institute perspective, the reliance on IRB (Institution Review Board) may no longer be adequate as we are unsure of its scalability. Most research studies claim that they are anonymizing participants’ data, but there still lies a tricky subject of re-identification. 

It is rather challenging to prove that deidentified data cannot be re-identified especially in medical images. So, the minimal that one can do is to ensure these data will not be used other than the dedicated research project itself. Furthermore, in the realm of radiology, there are many meta-data within the format in which medical imaging is being stored. For example, can the plain film be considered PHI (Protected Health Information)? 

Under the facilitation of Esteban Rubens, Global Enterprise Imaging Principal of PureStorage, the two guest speakers continued to highlight other hidden risks such as the inconsistency of machine learning (ML). Gerke commented on ML’s vulnerability for manipulation. Most of the time, a minute change in the input, will lead the algorithm to make an entirely different conclusion. As a result, Gerke’s concern is not so much on the AI Blackbox challenge but rather, the availability of a diversity of data for algorithm building and clinical trials to ensure the algorithm works for different population of patients. 

Medical professionals as the first line of defense

Teneholtz said, many of the ML algorithms are developed with local datasets. It does not make sense if the algorithm is generated in Boston but set it to deploy in Asia. The need for a broad patient cohort is inescapable. In order to overcome it, the best practice is not to share the data but the model updates. He cited a tech tycoon is experimenting the approach at the moment and probably medical institutions should adopt it soon. 

Teneholtz added, even if an algorithm is able to make an accurate prediction, says stroke or tumor, the medical professionals will be there to provide recommendations and to general the reports. They should be the first line of defense, shall there be any discrepancy or data security breach. A strong infrastructure should be in place to support the medical and healthcare staff. 

At the same time, institutions should always start with the thinking of what benefit the patients the most. Ultimately, the most ethical approach to treat data should not start or stop at a particular entity, it is a group effort. 

Author Bio
synthetic gene empathy chinese artificial intelligence data medicine healthcare ai

Hazel Tang

A science writer with data background and an interest in the current affair, culture, and arts; a no-med from an (almost) all-med family. Follow on Twitter.