The combined impact of common and rare exonic variants in COVID-19 host genetics is currently insufficiently understood. Here, common and rare variants from whole-exome sequencing data of about 4000 SARS-CoV-2-positive individuals were used to define an interpretable machine-learning model for predicting COVID-19 severity. First, variants were converted into separate sets of Boolean features, depending on the absence or the presence of variants in each gene. An ensemble of LASSO logistic regression models was used to identify the most informative Boolean features with respect to the genetic bases of severity. The Boolean features selected by these logistic models were combined into an Integrated PolyGenic Score that offers a synthetic and interpretable index for describing the contribution of host genetics in COVID-19 severity, as demonstrated through testing in several independent cohorts. Selected features belong to ultra-rare, rare, low-frequency, and common variants, including those in linkage disequilibrium with known GWAS loci. Noteworthily, around one quarter of the selected genes are sex-specific. Pathway analysis of the selected genes associated with COVID-19 severity reflected the multi-organ nature of the disease. The proposed model might provide useful information for developing diagnostics and therapeutics, while also being able to guide bedside disease management. © 2021, The Author(s).

Fallerini, C., Picchiotti, N., Baldassarri, M., Zguro, K., Daga, S., Fava, F., et al. (2022). Common, low-frequency, rare, and ultra-rare coding variants contribute to COVID-19 severity. HUMAN GENETICS, 141(1), 147-173 [10.1007/s00439-021-02397-7].

Common, low-frequency, rare, and ultra-rare coding variants contribute to COVID-19 severity

Chiara Fallerini;Margherita Baldassarri;Kristina Zguro;Sergio Daga;Francesca Fava;Elisa Benetti;Sara Amitrano;Mirella Bruttini;Maria Palmieri;Susanna Croci;Mirjam Lista;Giada Beligni;Ilaria Meloni;Marco Tanfoni;Elisa Frullanti;Marco Gori;Francesca Mari;Alessandra Renieri;Francesca Montagnani
Membro del Collaboration Group
;
Mario Tumbarello
Membro del Collaboration Group
;
Massimiliano Fabbiani
Membro del Collaboration Group
;
Laura Bergantini
Membro del Collaboration Group
;
Miriana D’Alessandro
Membro del Collaboration Group
;
Paolo Cameli
Membro del Collaboration Group
;
Federico Anedda
Membro del Collaboration Group
;
Simona Marcantonio;Sabino Scolletta
Membro del Collaboration Group
;
Federico Franchi
Membro del Collaboration Group
;
Maria Antonietta Mazzei
Membro del Collaboration Group
;
Susanna Guerrini
Membro del Collaboration Group
;
Edoardo Conticini
Membro del Collaboration Group
;
Luca Cantarini
Membro del Collaboration Group
;
Bruno Frediani
Membro del Collaboration Group
;
Annarita Giliberti
Membro del Collaboration Group
;
Maria Antonietta Mencarelli
Membro del Collaboration Group
;
Caterina Lo Rizzo;Anna Maria Pinto
Membro del Collaboration Group
;
Francesca Ariani
Membro del Collaboration Group
;
Miriam Lucia Carriero;Elena Bargagli
Membro del Collaboration Group
;
Marco Mandalà
Membro del Collaboration Group
;
Alessia Giorli;Lorenzo Salerni
Membro del Collaboration Group
;
2022-01-01

Abstract

The combined impact of common and rare exonic variants in COVID-19 host genetics is currently insufficiently understood. Here, common and rare variants from whole-exome sequencing data of about 4000 SARS-CoV-2-positive individuals were used to define an interpretable machine-learning model for predicting COVID-19 severity. First, variants were converted into separate sets of Boolean features, depending on the absence or the presence of variants in each gene. An ensemble of LASSO logistic regression models was used to identify the most informative Boolean features with respect to the genetic bases of severity. The Boolean features selected by these logistic models were combined into an Integrated PolyGenic Score that offers a synthetic and interpretable index for describing the contribution of host genetics in COVID-19 severity, as demonstrated through testing in several independent cohorts. Selected features belong to ultra-rare, rare, low-frequency, and common variants, including those in linkage disequilibrium with known GWAS loci. Noteworthily, around one quarter of the selected genes are sex-specific. Pathway analysis of the selected genes associated with COVID-19 severity reflected the multi-organ nature of the disease. The proposed model might provide useful information for developing diagnostics and therapeutics, while also being able to guide bedside disease management. © 2021, The Author(s).
2022
Fallerini, C., Picchiotti, N., Baldassarri, M., Zguro, K., Daga, S., Fava, F., et al. (2022). Common, low-frequency, rare, and ultra-rare coding variants contribute to COVID-19 severity. HUMAN GENETICS, 141(1), 147-173 [10.1007/s00439-021-02397-7].
File in questo prodotto:
File Dimensione Formato  
21) Common, low-frequency, rare and ultra-rare coding variants contribute to COVID-19 severity.pdf

accesso aperto

Descrizione: Articolo
Tipologia: PDF editoriale
Licenza: Creative commons
Dimensione 3.93 MB
Formato Adobe PDF
3.93 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1264395