The Association of American Medical Colleges, the Multi-Regional Clinical Trials Center at Brigham and Women’s Hospital and Harvard Medical School, and The New England Journal of Medicine had proposed a new data sharing system. One which will like individuals to the data that they had generated, in order to credit them when those data are re-used. The proposed system was mentioned again in Nature last week and its creators hope it will influence funding or promotion decisions in the near future. 

The suggested data-sharing system begins when a researcher published a set of data. Both the researcher and the data will be given unique identifiers to create a link between them. Something that is absent from academia earlier on. Other researchers, funders or academic institutions will now be able to locate these published data and cite them in future papers. While academic papers are cited with their distinctive DOI, datasets are will be cited with their exclusive PID. So that credits are rightfully given to the individuals who generated the data whenever it is being re-used. 

Views from the healthcare community 

The quest for a systematic data-sharing protocol or platform is not restricted to the academia. The healthcare community does demonstrate a similar need too. Speakers at the recent AIMed Breakfast Briefing believe the release of unstructured data that has not been curated could pose a risk to both patients’ privacy. Shall these unstructured data to be use in the development of artificial intelligence (AI) related solutions, their credibility of the end-product will also be in question. If a formal data-sharing platform is in place, researchers not only can garner the credibility of the generated solution, but other researchers could also validate it using the shared data. In the long run, perhaps some sort of ground truth could be built. 

Thus, speakers urged for more investment into better infrastructures. Presently, different medical institutions employ different systems to collect and store their data. Communications between these systems and institutions remain limited or virtually non-existence. This initiated a huge challenge for data-sharing. Most of the time, healthcare professionals are not able to achieve the kind of collaboration that they wish to see. 

The liability issues 

On the other hand, there comes the question on the extent of sharing. When an AI model is being developed, data is not the only component, does that mean the creators have to share the methods and tools too? If so, who should be liable if the shared data, methods and tools have gone down several different stages of development by separate research groups, but is now yielding problems or concerns? 

The new data-sharing system proposed at the beginning of the article also faces its own liability issue. Most of the time, researchers may not be the ones who handle data and those who handle data may not be involved in other aspects of the research project or churning out the published work. As such, should the individual who handled the data be credited or shall the researcher be credited or both? What is the number of personnel involved in handling the data reach an astronomical number and most of them are temporary staff, vendors or contractors? Do all of them have to be credited too? 

Data is valuable but data citation remains new. Unless there is a better call to see data as an output on its own, a new sharing system may not be adequate to address all the questions. 

Author Bio
synthetic gene empathy chinese artificial intelligence data medicine healthcare ai

Hazel Tang

A science writer with data background and an interest in the current affair, culture, and arts; a no-med from an (almost) all-med family. Follow on Twitter.