I am a pediatric cardiologist and have cared for children with heart disease for the past three decades. In addition, I have an educational background in business and finance as well as healthcare administration and global health – I gained a Masters Degree in Public Health from UCLA and taught Global Health there after I completed the program.
“You could have many explanations for what a complex model is doing. Do you just pick the one you “want” to be correct?”
Cynthia Rudin, Computer scientist
This commentary from Lancet Digital Health discusses the explainability issue of artificial intelligence in medicine with its myriad nuances. I do agree with the authors that the expectation that artificial intelligence tools can be explained, at least at the superficial level, is somewhat unrealistic and does in fact represent false hope. Perhaps we can also surmise that we may have just as difficult a time to put the thinking process of astute and seasoned clinicians into an explainable framework (the “pink bag” vs. the “black box”).
The major takeaway in this viewpoint is that, rather than relying on explainability as the metric to build trust in artificial intelligence, we should all be focused on the validation process as more of an indirect measure of its trustworthiness. Furthermore, we need to differentiate inherent (such as a relatively simple linear regression model) from post-hoc (a more complex high-dimensional data and model) explainability; the latter is partly “explained” with the use of heat maps (or saliency maps) in imaging, as well as other methods such as feature visualization, prototypical comparisons, and better known methods such as locally interpretable model-agnostic explanations (LIME) and Shapley values (SHAP). The authors remind us that these methods have no performance guarantees, but are nevertheless useful for model troubleshooting and systems audit.
This viewpoint mildly underestimates clinicians’ willingness to forgo total explainability in tools in biomedicine, such as those used for diagnostic (MRI machines) and therapeutic (pacemaker) purposes. The main difference between these tools and artificial intelligence is that, while the average clinician may not be able to “explain” how an MRI machine or a pacemaker works with all of its engineering details, someone in the world can. In the case of artificial intelligence, there are instances (especially deep learning methodologies) where no one, even the AI experts, can explain the inner workings of these tools.
The issue of explainability is really centered on the trust of not only artificial intelligence tools but also those clinicians and data scientists who espouse and advocate the use of these tools. Perhaps we will all settle eventually in the diminutive sweet spot between interpretability and accuracy vs. explainability, and a new paradigm of what is trustworthy and acceptable will be introduced with sufficient alignment to validation and equity.
Lead author Mohammad Ghassemi was a keynote speaker at AIMed’s Clinician Series in 2021. You can read more about his work and watch his talk here.