Abstract Winner – Cloud Computing & Big Data

                                                       CLOUD COMPUTING & BIG DATA

Author: Tom Velez

Coauthor(s): Emilia Apostolova PhD, Helen White, MBA, Patty Morris, David Eliason, MD and Tom Velez, PhD

Status: Completed Work

Funding Acknowledgment: This work was prepared by contractors supporting DoD/VA VCE under award number GS00Q09BGD0031 and government employees as part of their official duties. Title 17, USC, x105 provides that ’Copyright protection under this title is not available for any work of the U.S. Government.‘ Title 17, USC, x101 defines a U.S. Government work as a work prepared by a military service member or employee of the U.S. Government as part of that person‘s official duties. The views expressed in this article are those of the authors and do not necessarily reflect the official policy or position of the Department of the Navy, Department of Defense, or the U.S. Government.


The aim of this study is to utilize the Defense and Veterans Eye Injury and Vision Registry (DVEIVR), a DoD initiative to provide researchers with de-identified ocular care encounter data abstracted from military medical systems used by war zone clinicians in Afghanistan and Iraq, and develop methods for comprehensive and reliable Open Globe Injury (OGI) cohort identification in these registry records.

OGIs are considered emergencies requiring immediate surgical care and are a major cause of blindness and/or enucleations. Given the possibility of such catastrophic outcomes, early OGI treatment delivered in theater, associated outcomes, and risk factors are an active field of research. Ophthalmic war zone research poses challenges since early management of wounds to the eye are typically complicated by concurrent life threatening brain/systemic injuries and/or loss of limb. Additionally OGI wounds (globe lacerations, ruptures) following exposure to blasts can be subtle and/or occult, requiring exploratory surgery or imaging for definitive diagnosis. The initial surgical management of OGI is typically performed by deployed ophthalmologists working in concert with other surgeons in mass casualty/emergent care centers such as fielded combat support hospitals (CSHs) with variable resources.

In this context, our study focuses on the use of NLP towards the identification of OGI cases in free-text surgical “eye notes”, manually abstracted into DVEIVR by government analysts from either original handwritten/scanned or manually entered free form text encounter information found in electronic medical data repositories used in theater CSHs (e.g. the Theater Medical Data Store). The dataset used for our study contained encounter data for 26,131 patients and included 76,809 encounters. Although DVEIVR comprises of both structured and unstructured data describing diagnoses, treatments, procedures and exam findings, structured encounter data such as diagnoses or procedure codes that might otherwise be useful for OGI identification is frequently missing in emergent CSH records for practical reasons.

The challenges of the NLP task include low incidence rate (few positive examples), idiosyncratic military ophthalmology vocabulary, extreme brevity of notes, specialized abbreviations, typos and misspellings. We modeled the problem as a binary classification of free-form clinical notes. We utilized a combination of supervised learning using a linear kernel support vector machine (SVM) algorithm using features derived from word embeddings learnt in an unsupervised manner using neural networks. Word embeddings revealed to be a powerful source of identifying semantically related abnormalities and procedures, variant spellings, abbreviations, and typos. Given the rare occurrence of OGIs, the intuition was that identifying all possible tokens expressing key OGI concepts would improve our model performance. Using all available DVEIVR free-form text, we generated word vectors for all words in the vocabulary. The words in the training set were then clustered using the average cosine distance between word vectors as a measure of similarity between clusters. Using clustered vectors as features we achieved a precision of 92.50%, recall of 89.83% and overall F1-score of 91.14 measured using 10-fold cross validation.

In summary, we demonstrated an NLP approach to identifying rare patient phenotypes (OGIs) utilizing free-form text in a challenging domain: military war zone ophthalmology.