“Everyone has a plan until they get punched in the mouth.”

Mike Tyson, champion boxer

For this week’s article, there is a very good read from Lancet Digital Health on artificial intelligence systems for healthcare and how they can be assessed. The medical algorithmic audit, as the title states, can help the reader to have a better understanding of elements that can contribute to the failure of these tools. Although there are a myriad of manuscripts that delineate medical AI project issues, this one is relatively comprehensive and perhaps more relatable.

The paper does an outstanding job at both proposing and educating these blind spots in clinical AI projects. These system deficiencies include a “tendency to learn spurious correlates in training data; poor generalizability to renew deployment settings; and a paucity of reliable explainability mechanisms.”

The authors propose an audit for clinical AI tools entailing components that can contribute to the occurrence of errors and even describe the potential consequences. The algorithmic audit terms include scoping, mapping, artifact collection, testing, reflection, and post audit. The authors also go on to suggest several approaches for testing algorithmic errors, such as exploratory error analysis, subgroup testing, and adversarial testing (all of these concepts, by the way, may be totally unfamiliar to clinicians). The tables and panels are worth studying for clinicians interested in clinical AI.

A few constructive comments on this paper:

The concept of a medical algorithmic audit is a good one, but the name is not particularly attractive nor relatable to busy clinicians with little or no knowledge of AI/ML. One important aspect of this type of tool is that it ideally needs to be part of clinical practice and not limited to a publication. This publication-to-practice chasm exists not only for assessment tools but for clinical AI projects in general, as it is more in the mindset of project-to-publication. In addition, for any assessment tool to be adopted rather than just published, it needs strong clinician participation even at its ideation stage. For this manuscript, I notice a paucity of senior practicing clinicians amongst the august group of AI academicians in healthcare centers, and this is perhaps a relative shortcoming of the paper, as it lacks the seasoned clinicians’ frontline perspective that is so valuable. This lack of a multiple practitioner perspective is similar to engineers and data scientists planning a race for Formula One drivers without many drivers’ input. And we all know that the engineer-driver dyad and its intimate communication is what wins races consistently.

This “real clinical world”, with its fast pace and little time for anything else for clinicians, is essential for any AI tool (clinical or evaluative) to be adopted. In fairness to the authors, they do propose that the audit is a joint responsibility between users and developers and encourage the use of feedback mechanisms between these two groups.

Overall, this is a very good read for clinicians and data scientists and should be in one’s portfolio of clinical AI papers. Perhaps someday these assessment tools can be automated safely and predictably so that these tools actually reduce rather than increase the clinicians’ burden and accountability.

Read the full paper here: https://www.thelancet.com/journals/landig/article/PIIS2589-7500(22)00003-6/fulltext