The genetic component of many common traits is associated with the gene expression and several variants act as expression quantitative loci, regulating the gene expression in a tissue-specific manner. In this work, we applied tissue-specific cis-eQTL gene expression prediction models on the genotype of 808 samples including controls, patients with mild cognitive impairment, and subjects with Alzheimer Disease. We then dissected the imputed transcriptomic profiles by means of different unsupervised and supervised machine learning approaches to identify potential biological associations (all code is available at https://github.com/imerelli/DeepNeuro). Our analysis suggests that unsupervised and supervised methods can provide complementary information, which can be integrated for a better characterization of the underlying biological system. In particular, a variational autoencoder representation of the transcriptomic profiles, followed by a support vector machine classification, has been used for tissue-specific gene prioritizations. Interestingly, the achieved gene prioritizations can be efficiently integrated as a feature selection step for improving the accuracy of deep learning classifier networks. The identified gene-tissue information suggests a potential role for inflammatory and regulatory processes in gut-brain axis related tissues. In line with the expected low heritability that can be apportioned to eQTL variants, we were able to achieve only relatively low prediction capability with deep learning classification models. However, our analysis revealed that the classification power strongly depends on the network structure, with recurrent neural networks being the best Provisio performing network class. Interestingly, cross-tissue analysis suggests a potentially greater role of models trained in brain tissues also by considering dementia-related endophenotypes. Overall, the present analysis suggests that the combination of supervised and unsupervised machine learning techniques can be used for the evaluation of high dimensional omics data.

Maj, C., Azevedo, T., Giansanti, V., Borisov, O., Dimitri, G.M., Spasov, S., et al. (2019). Integration of machine learning methods to dissect genetically imputed transcriptomic profiles in Alzheimer’s Disease. FRONTIERS IN GENETICS, 10(JUL) [10.3389/fgene.2019.00726].

Integration of machine learning methods to dissect genetically imputed transcriptomic profiles in Alzheimer’s Disease

Dimitri G. M.;
2019-01-01

Abstract

The genetic component of many common traits is associated with the gene expression and several variants act as expression quantitative loci, regulating the gene expression in a tissue-specific manner. In this work, we applied tissue-specific cis-eQTL gene expression prediction models on the genotype of 808 samples including controls, patients with mild cognitive impairment, and subjects with Alzheimer Disease. We then dissected the imputed transcriptomic profiles by means of different unsupervised and supervised machine learning approaches to identify potential biological associations (all code is available at https://github.com/imerelli/DeepNeuro). Our analysis suggests that unsupervised and supervised methods can provide complementary information, which can be integrated for a better characterization of the underlying biological system. In particular, a variational autoencoder representation of the transcriptomic profiles, followed by a support vector machine classification, has been used for tissue-specific gene prioritizations. Interestingly, the achieved gene prioritizations can be efficiently integrated as a feature selection step for improving the accuracy of deep learning classifier networks. The identified gene-tissue information suggests a potential role for inflammatory and regulatory processes in gut-brain axis related tissues. In line with the expected low heritability that can be apportioned to eQTL variants, we were able to achieve only relatively low prediction capability with deep learning classification models. However, our analysis revealed that the classification power strongly depends on the network structure, with recurrent neural networks being the best Provisio performing network class. Interestingly, cross-tissue analysis suggests a potentially greater role of models trained in brain tissues also by considering dementia-related endophenotypes. Overall, the present analysis suggests that the combination of supervised and unsupervised machine learning techniques can be used for the evaluation of high dimensional omics data.
2019
Maj, C., Azevedo, T., Giansanti, V., Borisov, O., Dimitri, G.M., Spasov, S., et al. (2019). Integration of machine learning methods to dissect genetically imputed transcriptomic profiles in Alzheimer’s Disease. FRONTIERS IN GENETICS, 10(JUL) [10.3389/fgene.2019.00726].
File in questo prodotto:
File Dimensione Formato  
12.pdf

accesso aperto

Tipologia: PDF editoriale
Licenza: Creative commons
Dimensione 2.97 MB
Formato Adobe PDF
2.97 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1189271