At the recent AIMed UK 2020 virtual summit, Dr. Jorge Cardoso, Chief Technology Officer at London Medical Imaging and AI Centre for Value-based Healthcare and Senior Lecturer in Artificial Medical Intelligence at King’s College London noted an algorithm’s performance is intrinsically correlated to the amount and complexity of the data.

It may be unethical to share data that make AI tools safe

An AI model will understand patterns that it won’t be able to understand before by having access to more information. This can actually make AI safer as a result because even if several of the fields are wrong, the model can still use other fields to make credible predictions. From an ethical point of view, by not giving access to the data, or that large amount of data, would actually be unethical, since you are limiting the performance of an algorithm that is supposed to be used to help someone.

The comment was made after Dawn Monaghan, Head of Information Governance Policy at NHSx regarded data minimization (i.e., deciding on what data is required and asking whether there is a need to use personal data) and purpose limitation (i.e., defining purpose before embarking on data collection) as the greatest barriers to information governance. But Dr. Cardoso believes data minimization will not work in machine learning and large-scale analytics.

Besides, he also indicated one can easily write an algorithm to memorize the dataset and generate them again. Known as synthetic data, the information can still be used to train the intended algorithm, so how should regulators govern such complex machinery which most people will not have the technical skills to understand?

There are technologies making data-sharing obsolete

He cited an ongoing project with one of the Oxford AI Centres to further illustrate his point. “We are trying to diagnose someone who is having a disease. We need to know what they don’t have, which is everything else. So, the ‘everything else’ question is really interesting in the AI world. If you have a system that can only distinguish between healthy and a stroke patient for example, it will not be able to handle the diagnosis of a patient whose amount of pathology says he or she has a traumatic brain injury. We need all the data so that the algorithm can handle all these ‘negative’ cases as well”.

Dr. Cardoso adds technology advances very quickly and regulations are not catching up. For example, the federated learning approach adopted by his institution ensures data do not leave the healthcare system. Nobody, including those who are developing the algorithms do not see the data or see a private version of it. It is considered fully anonymized and as such, we shouldn’t really care about the regulation. Yet, it may not necessarily be the case and the law will still need to interpret it.

In another example, Dr. Cardoso said he and his team did a substantial amount of research to improve hospital operations such as predicting when patients might not attend their appointments and the number of medical staffs required based on the volume of patients on a certain day. Many of which are small decisions but with big impact. Again, they still cannot escape the data governance and regulatory scrutiny because they require the use of clinical data. The only different is probably the certification since the algorithms driving these decisions are not considered medical devices.

Minimal attention paid to the “where” aspect of data

Gerry Reilly, Chief Technology Officer at Health Data Research (HDR) UK added people seldom touched on the “where” aspect of data. That is the infrastructure or environment to hold the data in a secure manner, allow scientists to conduct safe and ethical research, build the AI model is an efficient manner to be released for testing or deployment thereafter. Reilly said HDR is trying to push forward a federated model of Trusted Research Environment (TREs) as the government call for more collaborative studies at the national level. Reilly had discussed the details with AIMed in one of the recent interviews.

Dr. Darren Gates, PICU Clinical Fellow and Innovation Consultant and Clinical Lead AI HQ at the Alder Hey Children’s NHS Foundation Trust echoed Reilly’s point. He said TREs facilitate the connection of different data and determine the bits of information that are going to be the most valuable to answer the targeted clinical problems and achieve data minimization when possible.

Overall, Dr. Cardoso believes there are many technical solutions to address the many governance issues we are facing right now but these technical solutions also face other issues that are not yet addressed because many of them have not been vastly tested and explored. Hence, it will be challenging for regulators to assess the next step and they will need time for that. Meanwhile, technology continues to progress and the gap between development and regulation remain.

More information about the AIMed UK 2020 virtual summit can be found here. You may now re-visit the summit on demand here.

*

Author Bio

Hazel Tang A science writer with data background and an interest in the current affair, culture, and arts; a no-med from an (almost) all-med family. Follow on Twitter.