“Trust is earned when actions meet words.” Chris Butler

I had an appointment for a COVID-19 test at a local hospital test site today. As I drove towards the destination, the GPS in my car surprisingly indicated that I should take a very long detour rather than a very straight route. I hesitated slightly with the dilemma as I see no obvious smoke indicating a fire in the canyon, and decided to take the shorter route nevertheless. I quickly realized just five minutes later that it was a mistake. My GPS with its updating capability had detected a traffic blockage several miles ahead that I had not known about. I realized that I lacked total trust in the GPS and this resulted in my suboptimal judgment.

The two terms explainability and interpretability are used interchangeably to explain transparency and trust but it is worth discussing the difference.

Explainability entails the intrinsic details of a device or system that can be elucidated. The observation that deep learning is not very explainable has consequently created a perception that deep learning and artificial intelligence are in a “black box”. Interpretability, on the other hand, is more about a predictable action and effect of a device or system.

For cardiologists like myself, we have patients with pacemakers: While most of us do not understand the biomedical engineering details of a pacemaker, we trust the functionality of the pacemaker as we program it to have certain pacing modes and expect the pacemaker to pace as directed. Another real world analogy would be a microwave: while most of us do not know the physics of the device, we are comfortable that it will function as a cooking or warming device.

The rationale for machine and deep learning decisions is a requirement for trust amongst healthcare professionals. The data science and artificial intelligence community continue to work on explainability in an overall effort called explainable artificial intelligence (XAI) as well as with methodologies such as Local Interpretable Model-Agnostic Explanations (LIME), Deep Learning Important Features (DeepLIFT), and Shapley Additive exPlanation (SHAP).

We, as AI in medicine advocates, no doubt need to continue to work on accountability and transparency of AI tools in medicine, but perhaps should also be a bit forgiving for the blackbox label of deep learning. The machines could easily counter with an observation that our human brains are “pink bags” that are often not entirely explainable either.

We clinicians certainly have had experiences with situations in which we cannot readily explain our thinking process and yet made the correct decision in either diagnosis or treatment. The trust will need to derive from both data scientists as well as clinicians to gain interpretability as explainability may not arrive anytime in the near future.