Background: Severity of illness scores are used for risk adjustment when comparing cohorts of critically ill patients in intensive care units (ICUs). Although these models have good discrimination, they are typically poorly calibrated, and over-predict mortality for low risk patients and under-predict mortality for high-risk patients. Therefore, clinicians are skeptical of their accuracy for real-time patient prognostication. We propose a sequential modeling approach to improve these prediction models. We hypothesized that by first stratifying patients into high (mortality prediction ≥ 10%) and low-risk cohorts, then applying four standard machine learning tools on a much larger set of candidate variables on only on the high-risk cohort, we could improve discrimination and calibration of mortality risk prediction in critically ill patients.
We used the Philips-MIT eICU Collaborative Research Database, a de-identified database of more than 1.8 million ICU admissions from 364 distinct ICUs across the USA, from 2003 to 2016. For the first stage of our sequential modeling approach, we selected the 523,214 patients who had mortality risk predictions ≥ 10% based on APACHE-IV scoring to form our high-risk cohort. We randomly split them into two groups (80% training set, 20% testing set).
For the second stage, we applied the original APACHE-IV, and created a new model using the same 142 variables as APACHE-IV but with weights recalibrated on our cohort (APACHE-HR). In addition, we developed four machine learning models to predict mortality: multivariable logistic regression (Logit), random forests (RF), AdaBoosted decision trees (ADT), and multilayer perceptrons (MLP). 230 variables were used to train the machine learning models, including the original APACHE-IV variables as well as an extended set of candidate features from the first 24 hours of ICU admission.
Model performance was assessed by discrimination (AUROC) and calibration [(Hosmer-Lemeshow test (HL) and standardized mortality ratio (SMR)]. Gains or losses in the AUROC were tested using Delong and Delong method.
When applied to the testing set, Logit, MLP and ADT had superior discrimination with AUROC of 0.84, 0.85 and 0.86 as compared to APACHE-IV, APACHE-HR and RF with AUROC of 0.74, 0.77 and 0.77 respectively (all models differed from each other with p<0.001). When applied to even higher risk subsets of the high-risk cohort, the AUROC of APACHE-IV and APACHE-HR attenuated sharply, while Logit maintained good discrimination even in very high-risk cohorts (e.g. mortality risk prediction ≥70%). In addition, Logit achieved the best calibration (HL=14.75, p>0.06; SMR =0.995) amongst all models.
Considering superior calibration with similar discrimination of Logit compared to MLP and ADT, and complexity of interpretation of non-linear models, we consider Logit as the overall best-performing second-stage model for patients at high risk of death as identified by APACHE-IV.
Our two-stage sequential model approach improved the state of the art in mortality prediction in the high-risk subgroup of a large ICU cohort. This suggests potential in applying sequential modeling approaches to improve individual patient prognostication to a level of performance that is acceptable to clinicians in order to inform real-time decision making and resource allocation.


Author: Rodrigo Deliberato

Coauthor(s): Rodrigo Octavio Deliberato MD PhD1,2,3, Stephanie Ko MBBS MPH3,4, Tejas Sundaresan MEng3, Aaron Russell Kaufman3,4, Leo Anthony Celi MD MS MPH3,7 1. Big Data Analytics Department, Hospital Israelita Albert Einstein, São Paulo, Brazil 2. Laboratory for Critical Care Research, Critical Care Department, Hospital Israelita Albert Einstein, São Paulo, Brazil 3. MIT Critical Data, Harvard-MIT Division of Health Sciences & Technology, Cambridge, MA, USA 4. Department of Medicine, National University Health System, Singapore 5. Department of Government, Harvard University, Cambridge, MA, USA 6. Beth Israel Deaconess Medical Center, Boston, MA, USA

Status: Completed Work