Development of models to understand the complexity in cardiovascular research and diagnostic

Suraci, Samuele

doi:10.25434/samuele-suraci_phd2022

One person dies every 40 seconds in the United States from cardiovascular disease (CVD). About 80% of deaths from cardiovascular events occur as a result of a stroke or a heart attack. Atherothrombosis occurs when a thrombus forms over an unstable atherosclerotic plaque. It is a diffuse process that starts early in childhood and progresses asymptomatically through adult life affecting multiple vascular beds. Later in life, it is clinically manifested as coronary artery disease (CAD), stroke, transient ischemic attack (TIA), renal and peripheral arterial disease. DPP3 is a cytosolic enzyme involved in the degradation of bioactive peptides, including angiotensin, involved in cardiovascular system-related biological processes. Circulating DPP3 is emerging as biomarker of clinical outcome in patients suffering of acute and chronic cardiovascular diseases (i.e., acute heart failure, cardiogenic shock and aneurismal subarachnoid haemorrhage). High-throughput technologies for the evaluation of genomics and transcriptomics and relative computational methods and important clinical/molecular datasets have been implemented and are easily available. The development of integrated competences and research/diagnostics strategies based on these complex and wide information is needed and represents a challenge for the present and the future. The aim of this PhD project was to get an insight into the complexity of atherothrombotic diseases taking advance from the opportunities offered by OMICS technologies. The first part of this work is about the development and application of bioinformatics pipelines for the mutational analysis of data derived from high productivity sequencing technology Illumina and the workflow optimization for diagnostic purpose. The mutational analysis regarded gene panels implicated in Marfan Syndrome and related disorders (97 genes), Von Willebrand Disease (10 genes) and Familial Dyslipidemia (57 genes). Among the total of 945 variants identified by NGS and selected for Sanger validation, 942 (99,7%) were confirmed. The mean coverage of experimental sessions was 173× for the SureSelect approach and 1100× for the Haloplex approach with an at least 98% of analyzable target bases; all variants met the phred-scaled quality score Q ≥30. Three out of 945 variants (0.3%) showed a discrepancy between the NGS datum and the subsequent validation. Two variants were in the LTBP2 gene while the third one involved the TGFB1 gene. All variants’ discrepancies were related to their heterozygous/homozygous state. The depth of coverage for the three loci ranged from 173× to 199×. All 3 variants were called as heterozygous and presented with balanced reads containing the wild-type or mutant allele (percentages of mutant on total alleles range from 45 to 54%). In all three cases, a deep evaluation of the discrepant gene variant and methodological approach allowed to confirm the NGS datum. Our results demonstrated the robustness of the custom pipeline developed and extend literature data in which almost 100% “high quality” NGS variants are confirmed by Sanger; moreover, it demonstrates that in case of discrepancy between a high-quality NGS variant and Sanger validation, NGS call, whether obtained with accurate pipelines, should not be a priori assumed to represent the source of the error. The second part is about the development and application of a bioinformatics pipeline to evaluate the global RNA expression profiles from cerebral thrombi, obtained during thrombectomy treatment and from peripheral venous blood in patients with acute ischemic stroke, using Affymetrix technology. We assessed, by Affymetrix technology using GeneChip Human Transcriptome Array 2.0, the gene expression profiles of 40 RNA obtained from AIS patients’ thrombi and 37 RNA from their venous peripheral blood for a total of 52 patients. Nineteen subjects had both thrombus and venous peripheral blood profiles. Data analysis was performed in R environment with dedicated pipelines. After data processing and application of the filtering criteria, the average of analyzable probe sets numbered 440,085 in thrombi and 602,874 in venous peripheral blood samples. In thrombi and peripheral blood samples, among all probe sets, 378,476 and 515,048 have an associated identifier symbol, and 20,343 and 20,902 are unique symbols, respectively. Looking to their intersection, 20,341 symbols were common to RNA from the different type of specimens, whereas 3 were unique symbols in thrombi and 562 were unique symbols in venous peripheral blood. The Gene Ontology (GO) enrichment analysis allowed to identify common and peculiar features and biological processes in thrombi and peripheral blood indicating that peripheral and central humoral/cellular mechanisms of damage and response to damage are present in thrombi and peripheral blood. Regarding the Significance Analysis of Microarrays (SAM) and Gene Onthology (GO) analysis of thrombi, we observed 221 significant biological processes associated with genes differentially expressed in patients with poor outcome evaluated by mRS. Among significant terms, those associated with regulation of neutrophil mediated immunity and activation play a crucial role. In thrombi, SAM analysis did not identify any significant difference according to the other hard endpoints: primary-endpoint, 24 hours edema and any ICH. About venous peripheral blood, SAM analysis identified: the differential expression according to primary-endpoint of RNF165, a gene expressed specifically in the nervous system; 298 probe sets according to 24 hours edema. Among significant enriched biological process gene ontology terms, those associated with regulation and activation of transcriptomes of cells in general play a crucial role; and the reduced expression in dead AIS patients of the gene AADACL3 coding a lipolytic enzyme for which poor information is available. In profiles obtained in peripheral blood, SAM analysis did not identify any significant difference according to the hard endpoints: mRS and any symptomatic intracranial haemorrhage (ICH). The third part is about reverse-genomic and functional characterization of single nucleotide variants in DPP3 gene. Databases of data deriving from thousands of genome-wide association studies were explored to search mutations in DPP3 gene associated with atherothrombotic phenotypes. For the variants presenting a significative association, a model of the mutated protein was built through the molecular modelling software PyMol. Basing on PyMol model results, in silico predictors of pathogenicity results and available literature, 5 variants (rs2305535, rs11550299, rs139251036, rs12421620, and rs747171479) were selected to be heterology expressed in E. coli taking advance from site directed mutagenesis and T7 expression system in BL21(DE3) E. coli cell strand. The enzymatic activity of the wild type and 5 mutated proteins was tested. Basing on the results the KM and the Vmax were computed by a custom script in R. Our results suggest the potential utility of the integrated approach of in silico and in vitro methods to evaluate the effect of variants in genes of interest. Due to the relatively low cost and experimental time of the applied approach, it could represent a possible resource to focus research on functional variants to be studied with more expensive, time consuming and/or ethically critical (animal models) experimental approaches.

Suraci, S. (2022). Development of models to understand the complexity in cardiovascular research and diagnostic [10.25434/samuele-suraci_phd2022].