Regina Barzilay didn’t set out to study breast cancer. As a professor at the Massachusetts Institute of Technology (MIT) in Cambridge, Barzilay was renowned for her work in artificial intelligence, developing machine learning models to process and understand human language and unstructured text.

But that was before her breast cancer diagnosis in 2014. Aged 43 at the time and with no history of breast cancer in the family, she had never considered herself at risk. But equally shocking to Barzilay was witnessing first hand the lack of data that doctors were working with in determining their clinical decisions.

“When you receive a diagnosis like that, you start asking all sorts of questions,” she says. “Am I going to survive? What’s going to happen to my son? But I also started asking other questions and was really surprised when my physicians couldn’t give me answers.”

Barzilay reasoned that algorithmic models like the ones she’d be working on at MIT could extract more from the clinical records. If so, perhaps machine learning could detect tumours like hers at an earlier stage and offer personalized treatment recommendations.

So Barzilay and her team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Massachusetts General Hospital (MGH) set to work, adjusting her protocols to parse patient medical reports and creating a new deep- learning model that can predict
from a mammogram if a patient is likely to develop breast cancer as much as five years in the future. Trained on mammograms and known outcomes from over 60,000 MGH patients, the model learned the subtle patterns in breast tissue that are precursors to malignant tumors.

Barzilay subsequently applied the technology to her own mammograms. “I discovered that my cancer was in the breast two years before I was diagnosed,” she says. Barzilay’s hope is for systems like hers to enable doctors to customize screening and prevention programs at the individual level, making late diagnosis a relic of the past. Of course, later diagnosis means aggressive treatments, uncertain outcomes, and more medical expenses. As a result, identifying patients has been a central pillar of breast cancer research and effective early detection. “By reducing uncertainty and truly personalizing patient care, machine learning can transform this area,” says Barzilay.

Although mammography has been shown to reduce breast cancer mortality, there is continued debate on how often to screen and when to start. While the American Cancer Society recommends annual screening starting at age 45, the US Preventative Task Force recommends screening every two years starting at age 50.

“Rather than taking a one-size- fits-all approach, we can personalize screening around a woman’s risk of developing cancer,” says Barzilay, “For example, a doctor might recommend that one group of women get a mammogram every other year, while another higher-risk group might get supplemental MRI screening.”

The team’s model was significantly better at predicting risk than existing approaches: It accurately placed 31% of all cancer patients in its highest-risk category, compared to only 18% for traditional models.

Harvard Professor Constance Lehman says that there’s previously been minimal support in the medical community for screening strategies that are risk-based rather than age- based.“This is because before we did not have accurate risk assessment tools that worked for individual women,” says Lehman, a professor of radiology at Harvard Medical School and division chief of breast imaging at MGH. “Our work is the first to show that it’s possible.”

Barzilay and Lehman co-wrote the paper with lead author Adam Yala, a CSAIL PhD student. Other MIT co-authors include PhD student Tal Schuster and former master’s student Tally Portnoi.

How it works

Since the first breast-cancer risk model from 1989, development has largely been driven by human knowledge and intuition of what major risk factors might be, such as age, family history of breast and ovarian cancer, hormonal and reproductive factors, and breast density.

However, most of these markers are only weakly correlated with breast cancer. As a result, such models still aren’t very accurate at the individual level, and many organizations continue to feel risk-based screening programs are not possible, given those limitations.

Rather than manually identifying the patterns in a mammogram that drive future cancer, the MIT/MGH team trained a deep-learning model to deduce the patterns directly from the data. Using information from more than 90,000 mammograms, the model detected patterns too subtle for the human eye to detect.

“Since the 1960s radiologists have noticed that women have unique and widely variable patterns of breast tissue visible on the mammogram,” says Lehman. “These patterns can represent the influence of genetics, hormones, pregnancy, lactation, diet, weight loss, and weight gain. We can now leverage this detailed information to be more precise in our risk assessment at the individual level.” Making cancer detection more equitable.

The project also aims to make risk assessment more accurate for racial minorities, in particular. Many early models were developed on white populations, and were much less accurate for other races. The MIT/ MGH model, meanwhile, is equally accurate for white and black women. This is especially important given that black women have been shown to be 42% more likely to die from breast cancer due to a wide range of factors that may include differences in detection and access to health care.

“It’s particularly striking that the model performs equally as well for white and black people, which has not been the case with prior tools,” says Allison Kurian, an associate professor of medicine and health research/policy at Stanford University School of Medicine. “If validated and made available for widespread use, this could really improve on our current strategies to estimate risk.”

The model has already been implemented in Massachusetts General Hospital, and the team are in talks with other hospitals across the country and internationally.  Barzilay says their system could also one day enable doctors to use mammograms to see if patients are at a greater risk for other health problems, like cardiovascular disease or other cancers. The researchers are eager to apply the models to other diseases and ailments, and especially those with less effective risk models, like pancreatic cancer.

“Our goal is to make these advancements a part of the standard of care,” says Yala. “By predicting who will develop cancer in the future, we can hopefully save lives and catch cancer before symptoms ever arise.”

Regina Barzilay is the Delta Electronics Professor at CSAIL and the Department of Electrical Engineering and Computer Science at MIT and a member of the Koch Institute for Integrative Cancer Research at MIT.