Computational Methods for Tumor Neoantigen Profiling from Transcriptomic Data

Tatoni, Danilo

Cancer cells accumulate somatic mutations that can generate neoantigens recognizable by the immune system, providing the foundation for immunotherapies such as checkpoint inhibitors and personalized cancer vaccines. While high-throughput sequencing enables computational neoantigen profiling, established methods typically require matched tumor-normal DNA paired with tumor RNA-seq, a setup often impractical in clinical settings. Somatic variant detection from RNA-seq alone offers a feasible alternative, yet the absence of a matched normal and the error-prone nature of RNA-seq complicate the distinction of true somatic variants from germline variants and technical artifacts. Furthermore, existing methods for estimating clonal prevalence of somatic variants are designed for DNA and cannot accommodate the additional variability in RNA-derived allele frequencies, limiting the applicability of RNA-seq in exploring clonal neoantigen architecture and its relationship with patient outcome. This thesis addresses these gaps through two main contributions. First, it presents an end-to-end computational workflow to profile tumor neoantigens using solely tumor RNA-seq. A probabilistic germline error model, exploiting somatic alteration rates and population genetics, is formulated to prioritize cancer alterations. This model is integrated into a Snakemake-based workflow for scalable and reproducible neoantigen identification. Validation on the TESLA benchmark dataset demonstrated consistency across patients, correctly identifying most validated immunogenic neoantigens with overall superior performance compared to teams employing both DNA and RNA sequencing. Additional validation on two cohorts treated with adoptive cell therapies confirmed successful identification of immunogenic neoantigens even in patients with low mutational burden. Second, this thesis investigates how the burden of clonal neoantigens predicted from RNA-seq correlates with response to immune checkpoint inhibition. A probabilistic model based on Beta-Binomial mixture distributions was developed to characterize clonal variant distributions and assign cluster membership likelihoods. A weighted score integrating neoantigen count, cluster assignment probability, and transcript expression was evaluated across six public ICB cohorts and an unpublished Merkel Cell Carcinoma (MCC) cohort. Results showed clonal neoantigen scores were consistently higher among responders, though non-responders also exhibited elevated scores, indicating that absolute clonal burden alone does not discriminate response. When stratified by tumor microenvironment, significant associations emerged in immune-enriched non-fibrotic and immune-depleted subtypes but not in fibrotic subtypes, suggesting stromal fibrosis may impede neoantigen-directed immunity. For MCC specifically, a dedicated methodology was developed to identify neoantigens arising from chimeric viral-human coding sequences, revealing candidates with high predicted binding affinity representing ideal immunogenic targets due to their foreignness. Finally, a preliminary exploration integrating ribosome profiling (Ribo-seq) with RNA-seq from healthy tissues revealed that transcription does not guarantee mutated protein output. While most missense variants showed concordant allele frequencies across assays, frameshift variants largely lacked ribosomal engagement downstream of the variant position, consistent with translational surveillance mechanisms. These findings suggest Ribo-seq could serve as a complementary layer to refine neoantigen prioritization, warranting future investigation in tumor samples to improve the specificity of neoantigen-based immunotherapies.

Tatoni, D. (2026). Computational Methods for Tumor Neoantigen Profiling from Transcriptomic Data.