This is a manuscript to focus on a hybrid data- and knowledge-driven approach to biomedical data that can be useful for clinical research. This work from Germany recognizes the challenges of the complexity as well as high dimensionality of biomedical data in healthcare, and suggested this approach of using prior knowledge around the known biological interactions and combine this with patient data. This framework of using new patient representations by leveraging the prior knowledge with patient data is called CLEP (Clinical Embedding of Patients).

This framework takes two inputs (patient level dataset and a knowledge graph with relations between features in the dataset) and incorporates or embeds the patients into the knowledge graph. New patient representations are then derived from both data and knowledge driven features via knowledge graph embedding models, or KGEMs. Downstream tasks such as patient classification and stratification are then completed.

The manuscript then presented case scenarios with two patient datasets (Alzheimer’s disease neuroimaging initiative dataset and transcriptomics dataset from three psychiatric disorders).

After a knowledge graph was combined with the datasets, new representations of the patients were generated. These KGEMs were used to classify normal and cognitively impaired patients with various statistical modeling and ML methods (logistic regression, support vector machines, random forest, and XGBoost).

The results show that the new representations generated by the CLEP framework substantially increased the predication performances. The promise of this framework is to have an integration platform for multimodal datasets (including imaging, clinical, genomic, and other datasets) for more comprehensive patient representations and to innovate towards precision diagnosis and therapy.

Of note, the authors released CLEP as an open source Python package with examples and documentations to facilitate collaboration and peer review.

The full paper can be read here