MIT’s Dr. Leo Anthony Celi on creating the powerful dataset, MIMIC – Multi-parameter Intelligent Monitoring for Intensive Care



As anyone who’s worked in intensive care knows all too well, patients are connected to an array of equipment and monitors that continuously stream, in real-time, staggering amounts of information including vital signs, waveform data, fluid administration, imaging and lab results and records of medication. Despite the richness of this data, in the early 2000s, much of the information was never systematically captured or recorded. Often, it would simply disappear as quickly as it appeared.

Roger Mark, Distinguished Professor of Health Sciences and Technology of Electrical Engineering and Computer Science at Massachusetts Institute of Technology (MIT) knew that more could be done. So he and his colleagues at the Laboratory of Computational Physiology set out to design a system to track the physiologic state of each ICU patient, from admission to the end of their medical journey, using it to empower the development of knowledge-based clinical reasoning, predictive modeling, pattern recognition and signal processing.

Mark and his colleagues hoped that the system would fuel early detection of complex problems, provide relevant guidance for fellow clinicians on therapeutic interventions and ultimately, lead to improved patient outcomes. In 2003, that vision became a reality. Funded by the National Institutes of Health, MIT, Beth Israel Deaconess Medical Center (BIDMC) and Philips Medical Systems launched the ‘Integrating Signals, Models and Reasoning in Critical Care’ project with the aim of building a massive critical care research database.

The three partners set out to collect comprehensive clinical and physiological data from all ICU patients admitted to the multiple adult medical and surgical ICUs of BIDMC. Three categories of data were collected: clinical data aggregated from ICU information systems and hospital archives, high-resolution physiological data such as waveforms and time series of vital signs and alarms coming from bedside monitors), and death records from the Social Security Administration Death Master Files.

Today, the project is popularly known as MIMIC or ‘Multi-parameter Intelligent Monitoring for Intensive Care’ and it is publicly available. MIMIC 3, released in 2016, contained data from approximately 50,000 patients admitted to BIDMC between years 2002 and 2012. It had produced more than 1000 publications and conference proceedings. MIMIC 4 debuted last summer, and comes with data compiled through 2018.

Even though MIMIC is a huge dataset, its generalizability has always been questionable since all information originates from a single institution. Nonetheless, the Laboratory of Computational Physiology has never given up on the idea of incorporating effort and expertise from multiple institutions and across disciplines. The MIMIC team is now harnessing data outside of BIDMC and coming from new modalities including x-ray images and echocardiograms to better support ICU care research.

To maximize the impact of data-sharing and collaborative learning, Dr. Leo Anthony Celi, Principal Research Scientist at MIT introduced Datathon in 2014. Datathon requires participants with diverse skillsets and backgrounds to come together and solve practical healthcare challenges using publicly available databases. In one of the latest Datathons held in spring 2020, 300 data scientists and healthcare professionals from around the world met virtually to uncover insights into the COVID-19 pandemic.

Participants were free to explore and examine the epidemiology of COVID-19; policy impacts; disparate health outcomes, the pandemic response in New York City and misinformation using the 20 open data sources. Of the 47 teams who presented their final projects, a handful went on to turn their ideas into peer-reviewed publications.

“The amulet to protect us from the next pandemic is not a cure or a vaccine, it’s our ability to learn from the data and understand our advantages over the virus,” said Dr. Celi. “In the words of Yuval Noah Harari, ‘a coronavirus in China and a coronavirus in the US cannot swap tips about how to infect humans’. But as humans, we can compare notes, we can learn how the virus spreads in the community and how to prevent it.”

Dr. Celi highlights the absence of patient-level COVID-19 data. Most of the publicly available data are at the county or country level, which is highly dependent on local testing. In places where testing has been limited, the number of cases, naturally, will remain low. But the number of cases ought to be adjusted based on the number of tests performed and how random tests were done.

Dr. Celi also warned of the short-sightedness of the healthcare and academic communities in data-sharing as a result of regulatory concerns, data monetization, lack of funding initiatives, and a general public-or-perish attitude. He believed MIMIC and other publicly available datasets are testaments that data-sharing is possible. “The main obstacle is to change the culture on the ground,” he added. “We need to remove the silos between different disciplines who need to work together to leverage the real value of data.”