Vidal Arroyo

Background and Problem: Currently, 400 million people worldwide have a rare disease. While advances in data collection and processing have resulted in a medical big data revolution, these advances have failed to affect the diagnosis and treatment of rare diseases due to the dearth of large datasets for these diseases. If unaddressed, this will lead to a health disparity where treatment drastically improves for common diseases while rare diseases are left behind. Thus, there is a need for novel solutions that can overcome the lack of large datasets in rare diseases in order to bridge this data disparity.

Solution: In this work, we propose that a deep learning strategy known as transfer learning can be used to eliminate this data disparity. Specifically, we will propose a case use where transfer learning is used to take pre-trained models already trained on clinical data of common diseases and transfer the pre-trained weights of those models to the analysis of a rare disease where only fine-tuning is necessary. We will focus on how these methods can be employed for the analysis of a) text and b) imaging data for rare disease patients.

Methods: Albeit transfer-learning has promise, limitations may arise in situations where training and testing datasets are written in different styles or even different languages. Borrowing from the field of philosophy, we propose a novel structural representation known as a logic map where data is represented in a propositional fashion (ideas and meaning) rather than a semantic fashion (structure and form). By teaching deep natural language processing algorithms to learn meaning rather than structure, we will be able to aggregate datasets across languages and cultures, ultimately leading to increased algorithmic performance and thus improved health outcomes for patients with rare diseases.