A new study has revealed that a deep learning tool used to analyze knee x-rays, may be able to dramatically reduce pain disparities in underserved populations.

The study, supervised by Ziad Obermeyer, Associate Professor of Health Policy and Management at the University of California, Berkeley, trained algorithms to read knee x-rays for arthritis by using patients as the AI arbiters of truth, instead of doctors. The results revealed that radiologists may have blind spots when it comes to reading Black patients’ x-rays.

The algorithms trained on patients’ reports did a better job than doctors at accounting for the pain experienced by Black patients, apparently by discovering patterns of disease in the images that humans usually overlook.

Researchers noted that when treating knee osteoarthritis, people of color scored much higher on knee pain scales than white individuals. There are two possible explanations for these disparities: underserved patients may have more severe osteoarthritis within the knee, or external factors – like life stress or social isolation – cause or exacerbate knee pain among underserved populations.

But history may explain why radiologists aren’t as proficient in assessing knee pain in Black patients. The severity of osteoarthritis is typically classified using a standard called the Kellgren-Lawrence grade (KLG), a decades-old method developed in white British populations. Researchers claimed that this standard may miss physical causes of pain in people of color. They also noted that there are known racial and socioeconomic biases in how a patient’s pain is perceived by observers.

“If the pain experienced by underserved populations is caused by objective factors missing from current measures, a range of painful, treatable knee ailments would be misattributed to factors external to the knee,” the team stated.

Obermeyer and researchers from Stanford, Harvard, and the University of Chicago created computer vision software using the NIH data to investigate what human doctors might be missing. They programmed algorithms to predict a patient’s pain level from an x-ray. Over tens of thousands of images, the software discovered patterns of pixels that correlate with pain.

When given an x-ray it hasn’t seen before, the software uses those patterns to predict the pain a patient would report experiencing. Those predictions correlated more closely with patients’ pain than the scores radiologists assigned to knee x-rays, particularly for Black patients. That suggests the algorithms had learned to detect evidence of disease that radiologists didn’t. “The algorithm was seeing things over and above what the radiologists were seeing—things that are more commonly causes of pain in Black patients,” Obermeyer says.

Obermeyer and his team found that the deep learning model is more accurate than KLG in predicting self-reported pain levels for both Black and White patients and it reduced racial disparity at each pain level by close to half. While the goal of the research team is not to replace KLG with the deep learning model, they highlight that the current method of measuring pain is possibly erroneous and biased against Black patients. It could also serve as a wake-up call to the medical community to study more closely the radiographic markers that the deep learning model is utilizing and update their pain score methodology.

More importantly, the research team wanted to highlight the fact that AI algorithms do not always have to be trained with well-established expert knowledge, they can also be trained based on self-assessments. “(The study) actually highlights a really exciting part of where these kinds of algorithms can fit into the process of medical discovery,” Obermeyer said. “It tells us if there’s something here that’s worth looking at that we don’t understand. It sets the stage for humans to then step in and, using these algorithms as tools, try to figure out what’s going on.”

Obermeyer added if algorithms were only ever trained to match expert performance, inequities and gaps would continue to exist. “This study is a glimpse of a more general pipeline that we are increasingly able to use in medicine for generating new knowledge.”

The full study can be read in Nature here