Two years ago, Daniel Acuña, Computer Scientist and Assistant Professor at Syracuse University created an algorithm that can spot high resemblance images including those that have been rotated, flipped or altered in sizes, across a large pool of research papers.
A controversial algorithm
The software automated the location of imagery duplicates, a task which is usually performed manually by research integrity experts to search for honest mistakes or possible misconduct. Since his brainchild is still at the experimental stage and is being tested by different journals and research institutions at the moment, Acuña decided to use it on COVID-19 preprints.
He downloaded 3500 preprints from two major coronavirus repositories – bioRxiv and medRxiv this June and use his algorithm to pull out and compare close to 21,000 images. Within four hours, Acuña’s algorithm had highlighted an approximate 400 potential imagery duplicates. Of which, Acuña had chosen 24 papers that he thought worth looking into and posted them on his website as well as PubPeer, a public scientific research review and discussion website.
The act ignited a commotion, especially among authors of concerned. Some said they would correct their mistakes; others reasoned the algorithm should not work alone and a substantial human supervision is required. Acuña agreed, as he recounted the incident to Nature recently. He believes his algorithm is only useful to uncover details that human eyes are likely to miss. More importantly, some researchers reflected in subsequent open letters that the algorithm did not flag matching, but similar images.
Acuña had since removed one-third of the research papers he questioned earlier from his website. He also set the remaining ones as private access, so anyone who is interested in the findings will have to seek his permission first. Image integrity analyst said automated image checking is still at its infancy stage, most of them cannot detect other forms of manipulations except finding duplicates.
More improvement needed for both the technology and COVID-19 research
One of the major shortcomings for these algorithms is they are easily impotent by images saved or released in PDF format. Previously, Nature also reported scientists using similar software to spot errors in DNA and RNA sequences also made the same complaints. Nevertheless, all these did not put off Acuña, he is still analyzing COVID-19 preprints. He said he would first alert the authors if he spotted any concerns and will make them public if authors refused to respond. He told Nature, he will like the authors “to be aware that someone is doing this”.
Indeed, AIMed mentioned earlier, as the pandemic continues to affect us at an unprecedented way and hundreds of related studies being published on a daily basis in both peer-reviewed and digital preprint manner. It probably takes researchers more time to sift through and digest the information than to generate effective remedies that can combat COVID-19. Besides, some researchers are also worried about the quality of the research, especially more than 20 research studies on COVID-19 had been withdrawn or retracted thus far.
As Dr. Leo Anthony Celi, Associate Professor of Medicine (part-time) at Harvard Medical School and Principal Research Scientists at the Massachusetts Institute of Technology remarked in the recent AIMed webinar – Key trends in critical care AI, “we were drowning in small studies that are not well-controlled and observational studies that are clearly confounded with so many factors had made it to the media”.
Nature noted there is no evidence to suggest COVID-19 research work is being retracted at high rates as compared to other literature concentrating on other domains. However, COVID-19 related papers tend to bring up greater scrutiny means they are being examined more closely and frequently, this alone, signifies a larger room for improvement.