Three Stanford University researchers found that many medical artificial intelligence (AI) models were trained using patient data coming mostly from the US states of California, New York and Massachusetts. They arrived at the conclusion after analyzing geographic distributions of training datasets in peer-reviewed papers that were published between years 2015 and 2019. These data were used in the development of deep learning algorithms meant for various medical sub-specialties including radiology, ophthalmology, dermatology, cardiology and gastroenterology.
Related findings were recently published in the Journal of the American Medical Association (JAMA) recently. Of all the 74 studies that had been examined, 76% included at least one geographically identifiable cohort of patients. Among them, 71% (40 studies) were from the US states of California, New York and Massachusetts. Only 24% (18 studies) have a geographically heterogeneous or ambiguous (i.e., large clinical trial span across five or more states or major studies conducted by the US National Institutes of Health (NIH)) cohort of patients. A total of 34 US states were never accounted for in any of the study while another 13 had contributed limited data.
AI algorithms should mirror the community
“California, Massachusetts, and New York may have economic, educational, social, behavioral, ethnic, and cultural features that are not representative of the entire nation; algorithms trained primarily on patient data from these states may generalize poorly, which is an established risk when implementing diagnostic algorithms in new geographies,” the authors wrote.
This is not the first time a lack of geographic diversity is found in clinical AI models. in 2018, a separate study published in PLOS Medicine also found that machine learning models trained using patient data from the Washington based NIH Clinical Center, New York based Mount Sinai Hospital and Indianapolis based Indiana University Network for Patient Care, performed significantly worse in the detection of pneumonia via chest X-rays when they were applied to patients coming from locations other than the ones they were trained on.
Dr. Amit Kaushal, one of the Stanford study investigators and Adjunct Professor of Bioengineering believes AI algorithms should mirror the community. If the AI-based tools were built for patients across the US, then they should not be trained on data coming from the same handful of places. Thus, the research community should start taking action for a change to uphold both the technical performance and fundamental reasons of equity and justice of AI.
A question of validity rather than accuracy
The study did not expose any wrongdoing of these clinical AI models but researchers raised questions on their validity especially when they are applied to patients from other areas. Dr. Russ Altman, the study’s investigator and Professor of Bioengineering asserted the importance of geography, which often correlates with many areas of health such as diet, exposures to chemicals and so on.
A geographically-myopic algorithm may cause harm and it will also not help patients living in other areas. At the same time, it will also not foster trusts among clinicians who have to deploy these AI models in their clinics. As such, researchers said the takeaways from this study is to make larger and more diverse datasets available for development and training of AI models. Data sharing is expensive particularly if it’s a standalone effort, so they urged stakeholders in the country to contribute more so that AI will have real progress.