There is much discussion and debate about the overall clinical utility of machine and deep learning studies in the COVID-19 pandemic-related work. This manuscript, a systematic review of all papers published from January to October of 2020, examines machine learning and its clinical utility with chest radiographs and chest CT scans.

The heterogenous sources ranged from bioRxiv, medRxiv, and arXiv as well as EMBASE and MEDLINE and 62 studies were included (out of 2,212 studies found in the initial query). These authors concluded that none of the models in these studies are of clinical utility due to methodological flaws and/or underlying biases (in at least one of four domains: participants, predictors, outcomes, and analysis). These issues include: 1) bias in small datasets; 2) variability of internationally based datasets; 3) poor integration of imaging data (“Frankenstein datasets”); 4) prognostication difficulty; and 5) clinician-data scientist collaboration (lack of).

In addition, most papers did not have external validation to assess generalizability nor did these papers have code for reproducing their results. In short, the papers seem to suffer from universal deficiencies: poor-quality data, poor machine learning methodology, low reproducibility, and biases in design.

Finally, these authors offer a set of good recommendations in five areas as a guide to higher-quality model development: 1) data; 2) evaluations; 3) reproducibility; 4) authors; and 5) review process. The elements needed for future higher quality work include: higher quality datasets, sufficient documentation for reproducibility and external validation, and models for independent technical and clinical validation as well as cost-effectiveness.

While this is a laudable project with exhaustive research in assessing the data science and machine learning aspects of the work on COVID-19 to date, an essential part of this intellectual consortium is somewhat missing (at least amongst the main authors): senior clinicians and radiologists.

Therefore, even if the machine learning is very valid and the model is deemed to be very good, we are still at high risk of type III error (the right answers or science to the wrong questions or approach to COVID-19, which can change with time) unless more senior clinicians of various subspecialties are directly involved in works like this.

The understandable rush to utilize both chest radiograph and CT was only a temporary solution to diagnosis during the early months of the pandemic and is not very pragmatic currently as reverse transcription (RT-PCR) test is the test of choice. In other words, in the face of a positive RT-PCR and characteristic symptoms and signs of COVID-19, the radiologic imaging is much less relevant or simply not useful for immediate decision making process for clinicians (perfect model with zero impact).

These studies usually do not have longer term followup (to determine if some patients are early in the disease process), and perhaps the focus should be more on longer term followup and prognosis of these COVID-19 patients. These authors espouse a clinician-data scientist collaboration, so an even stronger clinician contingent will be useful for such a monumental meta-analysis of works in this domain to be even more impactful.

The full article can be read here