Severe sepsis and septic shock are conditions that affect millions of patients and have close to 50% mortality rate. A key goal in critical care medicine is the early identification and treatment of infected patients with early stage sepsis. The most recent guidelines for the management of severe sepsis and septic shock include early recognition and management of these conditions as medical emergencies, immediate administration of resuscitative fluids, frequent reassessment, and empiric antibiotics as soon as possible following recognition. Each hour of delay in the administration of recommended therapy is associated with a linear increase in the risk of mortality rate, driving the need for automation of early sepsis recognition. Early recognition of infections that can lead to sepsis, defined as life threatening organ dysfunction due to a dysregulated response to the infection, can be challenging for several reasons: 1) sepsis can quickly develop from any form of common infections (bacterial, viral or fungal) and can be either localized or generalized; 2) culture-dependent diagnosis of infection is commonly slow and unreliable; 3) vital sign abnormalities traditionally associated with infections (e.g. hyperthermia), may commonly be the result of other noninfectious disease processes and lack specificity as an effective basis for automated recognition of infection; 4) the presence of infection risk factors (e.g. signs/symptoms of infection, recent blood stream exposure due to indwelling catheters, etc.) can help improve quality of non-specific evidence, however in many cases these risk factors are only captured in free text clinical nursing notes.
In this study, we developed a method for automatic monitoring of nursing notes for signs and symptoms of infection. The study data used for our work was the Medical Information Mart for Intensive Care (MIMIC-III) dataset, a large, freely-available database comprising de-identified health-related data associated with over 40,000 patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The dataset contains over 2 million free-text clinical notes. We focused only on nursing notes for adult patients, and our study dataset consisted of a total of 634,369 nursing notes.
We utilized a creative approach to automatically generate an annotated dataset using a combination of antibiotic ontology and rules to label notes for machine learning. The individual nursing notes were represented as a bag-of-words (1-grams). 70% of the automatically generated dataset was used for training and the remaining 30% for testing. The derived training dataset was used to create a linear kernel Support Vector Machine (SVM) model that achieved an F1-score ranging from 79 to 96