Abstract:
The genome-wide analysis of genetic associations with lipid metabolism indicators was carried out using the technology of Bayesian networks (BN). It was performed to diagnose polygenic hypercholesterolemia on the basis of genetic data of the Russian population of patients. The data of 1,200 patients was analyzed. 196725 SNPs as well as clinical data, lipid profile indicators — different types of cholesterol — were obtained for each of them. The genome-wide association analysis (GWAS) and the statistical method of Pearson's chi-squared test were used for the initial selection of the most significant parameters. Two of the patient states related to a lipid metabolism were studied. These states are the level of LDL-C (low density lipoprotein) and the level of HDL-C (high density lipoprotein). The Bayesian networks having the simplest topology — naive — were used to predict the level of lipoprotein. The construction of ROC-curves and the calculation of the area under these curves (AUC) were used to assess a quality (reliability) of the prediction. AUC value increased from 0,5 for the initial BN to 0,9 after selecting of significant parameters using the GWAS method or the Pearson one. A further increase in AUC to 0,99 and decrease in the number of prognostic parameters to 150 was performed using Bayesian network optimization with respect to the number of parameters-nodes. Here the optimized function was value of AUC. The ambiguity of obtaining prognostic parameters at various ways of initial reducing the number of network nodes using the methods of GWAS and Pirson is shown. Low values of AUC were obtained for an independent control group of patients, despite very good results on the quality of the predictions, which were obtained on the training set. Further application of the proposed methodology is possible after the substantial reduction of the number of SNPs on the base of the analysis of the respective molecular mechanisms.