The Vector Institute’s Laleh Seyyed-Kalantari on the serious ethical concerns around racial bias in AI models.

Artificial intelligence (AI)-based medical image diagnostic tools are trained to yield diagnostic labels from medical images. Recently, these tools have demonstrated radiologist-level performance in disease diagnosis, which makes them a clear case for deployment. In training these tools, a common practice is to optimize for and report the performance of the general population. However, when doing so, the state-of-the-art chest X-ray diagnostic tools trained on large public datasets fail to be fair naturally [1, 2]. Here, we define unfairness as differences in performance against, or in favor of, a subpopulation for a predictive task (e.g. higher performance on disease diagnosis in white patients versus black).

In our recent Nature Medicine paper [1], we trained four AI models on three large public chest X-ray datasets and a multi-source dataset that aggregate all those datasets on shared labels. We have demonstrated these models have reached state-of-the-art performance in disease diagnosis averaged across all patients. That is the point at which AI developers often stop and report their outcome. However, we have added one step further to our machine learning pipeline: a “fairness check”. We have done the fairness check on these AI models with respect to patients’ age and sex, as well as race/ethnicity and insurance type – where the data was available. We used insurance type as a proxy for socioeconomic status of the patients (e.g. patients with Medicaid insurance are often low income). Our analysis shows, following such a common practice, that AI models underdiagnosed historically underserved patients – such as females, patients younger than 20 years old, black or hispanic patients, or patients with Medicaid insurance who are often on a low income – at a higher rate compared to the general population. Such unfairness in underdiagnosis is even greater if a patient belongs to more than one historically underserved subpopulation (e.g. hispanic female). These unfairnesses are not an overall noise in the system, but rather an amplification of existing unfairness in the healthcare system.

In another study [3], we have shown that AI models can detect patients’ self-reported race/ethnicity from their chest X-rays. Our high performance race/ethnicity detection from medical images was consistent across many datasets and imaging modalities. This is potentially problematic once we read the outcomes of these two studies together, which highlight AI models can not only detect the patient’s race by looking only at medical images, but also behave against historically underserved (e.g. black) patients and misreport them healthy at a higher rate. This would potentially lead to those patients being denied care at a higher rate upon deployment.

Mitigating such unfairness of AI medical image diagnostic tools so that they can deliver high performance for all patients – regardless of their race/ethnicity, sex, age, and/or socioeconomic status – is desirable, nay expected before deployment of such models in practice. Policymakers are encouraged to audit for fairness before deployment approvals and fairness checks are suggested to be part of the pipeline for standardizing AI models in healthcare operations.

This is a serious ethical concern and ignoring it remains a critical barrier to effective deployment of these models in the clinics.

Dr Laleh Seyyed-Kalantari is an associate scientist at the Lunenfeld Tanenbaum Research Institute, Sinai Health System. She was a postdoctoral fellow at the Computer Science Department at the University of Toronto and the Vector Institute for Artificial Intelligence, focusing on developing AI-based medical image diagnostic methods. She received a PhD degree in electrical engineering from McMaster University in 2017. Her research interests span from machine learning in healthcare, AI in medical imaging, to optimization, and numerical modeling. She has received several highly competitive national, provincial, and institutional awards including the NSERC Postdoctoral Fellowship (2018).

 

[1] L. S. Kalantari, H. Zhang, M. McDermott, I. Y. Chen, and M. Ghassemi, ‘Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations’, Nature Medicine, Dec. 2021.

[2] L. S. Kalantari G. Liu, M. McDermott and, I. Y. Chen, M. Ghassemi, ‘CheXclusion: Fairness gaps in deep chest X-ray classifiers’, Pacific Symposium on Biocomputing (PSB), Virtual, Jan. 2021,

[3] I. Banerjee, A. R. Bhimireddy, J. L Burns, L. A. Celi, L. Chen, R. Correa, N. Dullerud, M. Ghassemi, S. Huang, P. Kuo, M. Lungren, L. Palmer, B. Price, S. Purkayastha, A. Pyrros, L. Oakden-Rayner, C. Okechukwu, L. S. Kalantari, H. Trivedi, R. Wang, Z. Zaiman, H. Zhang, J. W Gichoya, ‘Reading race: AI recognizes patient’s racial identity in medical images’, Under review.