We analyzed a set of metabolites obtained from the blood, serum, and urine to extract metabolic profiles for the four different types of lung cancer: Adenocarcinoma (AD), Squamous Cell Carcinoma (SQ), Large Cell Lung Cancer (LC), and Small Cell Lung Cancer (SC). Using the MetaboAnalyst program, we elucidated metabolic pathways specific for each type of lung cancer. The metabolites from the significant pathways were analyzed by the Ingenuity Pathway Analysis®(IPA) program. wWe elucidated genes associated with these metabolites. Found genes and expanded metabolite sets were analyzed for each studied type of cancer. The results yielded activated pathways for each cancer based on the initial metabolites sets from experimental data. Fifteen common pathways for all studied lung cancers; five selective for AC and fourteen for SC were defined. The LC and SQ did not have pathways distinguishing between them but were defined as a separate unique group.
Metabolites from the significant integrated pathways were selected for machine learning. For this purpose, we used the concentrations of metabolites for specific types of cancer and characteristics of metabolite compounds. The set contained 51 metabolites. Using the MOE software, we elucidated a set of descriptors related to their structural, physical, and chemical parameters. The descriptors set contained nine descriptors including SlogP_VSA, h_emd_C, vsa_acc and others. With WEKA software, we developed a machine-learning model for predicting specific type of lung cancer using these descriptors. The metabolite set was divided in two parts: a learning set and a training set; each of them contained metabolites specific for all four types of cancer. The decision tree algorithm of WEKA was applied to the set of molecular descriptors for metabolites from the training set. The ten-fold cross validation was used. The high accuracy was achieved with the first data set used as a training set, and a completely new data set used as the testing set. It was 83% using the j48 decision tree and 6 data value points. Obtained model is currently used for metabolite analysis in different specimens to associate them with different types of lung cancer.


Author: Igor Tsigelny

Coauthor(s): Valentina L. Kouznetsova, Katherine Zhuo

Status: Work In Progress