“Hectic fever at its inception is difficult to recognize but easy to treat, left untended it becomes easy to recognize but difficult to treat. ”

Niccolo Machiavelli, Italian Renaissance philosopher and writer


This manuscript describes the use of an artificial intelligence algorithm called sepsis early risk assessment (SERA) algorithm to predict and diagnosis sepsis. The algorithm uses both structured data as well as unstructured clinical notes to achieve a relatively high predictive accuracy 12 hours prior to the onset of sepsis with an AUC of 0.94 and sensitivity of 0.87/specificity of 0.87. The use of this SERA algorithm increased the early prediction of sepsis by physicians by 32% as well as deceased the false positivity by 17%.

The underlying methodology for the unstructured clinical notes is a topic-based, natural language processing (NLP)-enabled algorithm that accommodated the clinicians’ free-form clinical notes. The medical corpus of texts is large and specific so topics extraction is challenging. The authors, therefore, applied NLP with a topic modeling algorithm known as latent Dirichlet allocation (LDA).

LDA is an unsupervised machine learning model that uses probabilistic method for topic modelling (thus rendering it Bayesian): it loads documents as its input and finds topics as its output. The LDA model also determines the percentage of the document focused on each topic. The parameters of the LDA model consists of: number of topics; number of words per topic; and number of topics per document. The challenges of LDA include its data cleaning and data preparation but it is relatively fast and intuitive. A total of 100 topics in 7 categories was used for this model.

This work therefore combines both structured clinical data with unstructured clinical notes (the latter usually with much higher volume) to have a hybrid model of the best of both worlds of data and information. With the advent of more advanced NLP tools such as GPT-3 (generative pre-trained transformer 3) that uses deep learning to generate human-like text, the information yield from unstructured notes of the clinicians will increase even more.

The GPT-3 capacity of 175 billion machine learning parameters is mind-boggling (with the prior leader in capacity being Microsoft’s Turning NLG at about 17 billion parameters). The future of clinical projects deploying artificial intelligence will be this type of combined structured-unstructured dyad with appropriate tools for both.

Of course, the ultimate vision is that this prediction of sepsis model will be real-time and dynamic in nature based on all ICU patients’ data in the world, a vision that would have been enormously helpful during this COVID-19 pandemic.

The full article can be read here