Background and method
Distributed Learning (DL) is a paradigm to train Predictive Models (PMs) which allows to leave patient’s data within the institutions: by this way it can definitively solve any patient’s privacy issues for big data analysis. Furthermore, under some circumstances, it can converge to same results of classical, centralized approaches. Our aim is to propose a Preliminary Investigation (PI) of data and train PMs using real clinical data and an existent Rapid Learning Infrastructure (RLI): a cloud service called Varian Learning Portal (VLP). We also implemented a Distributed Ecosystem in which training sets, validation sets and PMs are constantly updated (and made available) according to the contribute of the day-by-day clinical practice and the so called “Rapid Learning” paradigm.
VLP simulator was used to test distributed algorithms. The service consisted of two parts: master (a cloud service) and sites (where the patient data are stored). The dataset used for experiments comes from an institutional database (EHR) and concerns patients affected by choroidal dome-shaped melanoma, treated with brachytherapy in December 2006-2014, distance to Fovea (DF)>1.5mm, tumor thickness >2mm and follow-up>4 months. Presence of diabetes, tumor volume and DF were chosen as factors, with the occurrence of maculopathy as outcome.
EHR was then randomly split in 3 parts, to simulate 3 different sites. First, a distributed chi-square test was executed in order to check covariate distributions across sites. Afterwards, a Cox proportional hazards model was learned using the RLI. Under both algorithms, intermediate statistic results were exchange between each site and master rather than clinical data. Chi-square and Cox proportional hazard model were also trained by the classical approach (merging all the data) and results between the distributed and the centralized approach were compared. Results of the distributed approach were presented exploiting a dashboard called Web-based dIstributed statistics REsult (WIRE). Finally, the architecture was completed to ensure that a new clinical data can trigger the automatic update of the models and a Decision Support Aid (DSA) was built to propose a comfortable interaction between clinicians and the system.
197 patients were considered for this analysis. The median follow-up was 51 months. The occurrence of maculopathy was recorded in 21% patients after treatment. PI does not show different covariate distributions for volume, DF and diabetes across sites (chi-square p-value > 0.05). WIRE allows clinicians to visualize covariate distribution results before training model in order to obtain robust PMs. The difference between the distributed and centralized p-value was less than 10^-8. Considering the Cox model all covariates analyzed, were resulted statistically significant (p<0.05). P-values between distributed (0.004 – 0.0006 – 9,86*10^-6) and centralized (0.004 – 0.0006 – 8,91*10^-6) approaches were shown percentage differences by 10% to 0%. Using the DSA, clinicians feed the requested patient characteristics and directly get the maculopathy free survival curve associated with the entered characteristics. DL ecosystem, combined using the existing cloud service and the developed interfaces, enables Really Rapid Learning and application of PMs and allows training them by using geographically distributed big data without sharing patient-related data.
CLOUD COMPUTING & BIG DATA
Author: Andrea Damiani Vincenzo Valentini
Coauthor(s): Carlotta Masciocchi[a], Nicola Dinapoli[b], Giuditta Chiloiro[b], Luca Boldrini[a], Jacopo Lenkowicz[a], Maria Antonietta Gambacorta[b], Luca Tagliaferri[b], Rosa Autorino[b], Maria Monica Pagliara[c], Maria Antonietta Blasi[c], Roberto Gatta[a], Roberto Negro[d], Johan van Soest[e], Andre Dekker[e] a. Istituto di Radiologia, Università Cattolica del Sacro Cuore, Largo F.Vito 1, 00168 Roma b. Polo Scienze Oncologiche ed Ematologiche, Fondazione Policlinico Universitario Agostino Gemelli, Largo A.Gemelli, 00168 Roma c. Polo scienze dell’invecchiamento, neurologiche, ortopediche e della testa-collo, Fondazione Policlinico Universitario Agostino Gemelli, Largo A.Gemelli, 00168 Roma d. KBMS srl e. Maastricht University Medical Center, Radiation Oncology MAASTRO-GROW School for Oncology and Development Biology, Maastricht, Netherlands.
Status: Work In Progress