“The best way to move forward is to draw the interconnections between many data points and create possible scenarios instead of extrapolating past data to predict the future.”

Sukant Ratnakar, business and innovation guru

For this week’s article, a very timely topic of a synthetic health record is the focus of a study from the MITRE not-for-profit organization, which has federally funded R&D centers and utilizes public-private partnerships to solve problems. These problems include not only those in healthcare but also in homeland security, cybersecurity, civil agency modernization, aviation, and defense and intelligence.

The proposed “coherent data set” (not sure about this name for this data set) is a novel synthetic data set that has structured data from SyntheaTM to yield a longitudinal coherent patient level electronic health record. This record then is a publicly available health data set that can obviate the concerns regarding privacy of patient records. For readers who are not familiar with SyntheaTM, it is an open-source synthetic patient population simulation made available by MITRE. SyntheaTM therefore provides the clinicians with artificial but realistic patient data for data science. This Coherent Data Set includes a myriad of data: genomic information, MRI images, clinical notes, and physiological data (that leverages System Biology Markup Language, or SBML, for non-linear changes). In addition, HL7 Fast Healthcare Interoperability Resources (FHIR) links these data sources together as a FHIR bundle. These MITRE authors claim that this data set is the first of its kind with all these data types from disparate sources (genetic markers, MRI images, clinical notes, and physiological waveforms) coming together into a complete and cohesive profile with interoperability for data science and AI purposes (digital public good).

While this is a laudatory effort to bundle the various types of health data so that data science can be leveraged, a sizable missing piece is real world data from outside the health system. In addition, this study focuses on cardiovascular disease, but some if not most patients have multi-system diseases and this relative simplification perhaps is less than realistic. Given these limitations, this coherent data set is a contribution to the medical community as a concept for AI adoption for clinicians without having to access data directly.

Read the full paper here: https://www.researchgate.net/publication/359859530_The_Coherent_Data_Set_Combining_Patient_Data_and_Imaging_in_a_Comprehensive_Synthetic_Health_Record