Abstract
Chimeric RNAs, noncanonical transcripts composed of exons from two or more genes, represent an important but understudied aspect of post-transcriptional regulation. Although originally attributed to chromosomal rearrangements, many chimeric RNAs arise from RNA-level mechanisms such as cis-splicing of adjacent genes (cis-SAGe) or trans-splicing of distinct pre-mRNAs. This dissertation integrates large-scale RNA sequencing with novel computational and experimental approaches to characterize the diversity, mechanisms, and biological significance of chimeric RNA formation across physiological and pathological contexts.
A comprehensive survey of chimeric RNA biogenesis, including genomic rearrangements, transcriptional readthrough, and intergenic or inter-allelic trans-splicing, was performed alongside an evaluation of current RNA-Seq–based detection pipelines (e.g., STAR-Fusion, EricScript, Pizzly). These analyses highlight both the opportunities and limitations of short-read sequencing for distinguishing biologically authentic chimeras from technical artifacts. Chimeric RNA analysis of COVID-19 patient whole blood revealed stress-associated cis-SAGe and trans-spliced transcripts enriched for hematopoietic and interferon-response genes, suggesting that chimeric RNA generation may act as part of the innate antiviral response via nucleocytoplasmic relocalization of splice-associated RNA-binding proteins.
To quantify chimeric RNA activity, a novel metric—the Relative Index of Chimeric Expression (RICE)—was developed, enabling normalization of chimeric abundance relative to parental gene expression. Application to GTEx and TCGA datasets uncovered tissue- and disease-specific patterns, including recurrent cis-SAGe transcripts associated with oncogenic and immune pathways.
To directly test for trans-splicing, an allele-specific F1 dihybrid mouse model (Mus musculus × Mus spretus) was developed to trace transcript origins through single-nucleotide polymorphism (SNP) phasing. This system enabled allele-resolved RNA-Seq analysis and provided direct evidence for inter-allelic and intergenic trans-splicing events in mammalian tissues. The accompanying computational pipeline, combining STAR, SNPsplit, and multi-step filtering, distinguished true trans-SAGe from readthrough cis-SAGe transcripts and identified thousands of high-confidence trans-splicing candidates.
The final component introduces a transcriptome-wide splice site promiscuity index to quantify the diversity of splice donor and acceptor site usage within and across genes. Using simulated, GTEx, and TCGA datasets, this metric successfully captured biologically relevant variation in splicing complexity, showing significantly higher promiscuity in testis, myeloid-enriched populations, and cancer tissues. Disease pathway and KEGG enrichment analyses revealed associations with leukemia, gastric, colorectal, and lung cancers, as well as upregulation of spliceosome-related pathways in high-promiscuity samples. Application to immune checkpoint blockade–treated NSCLC biopsies further demonstrated that elevated splice site promiscuity correlates with increased tumor mutational burden and improved therapeutic response. Collectively, these results indicate that splice site promiscuity reflects transcriptomic instability and may serve as a predictive biomarker for oncogenic potential and treatment sensitivity.
Together, these findings establish a comprehensive framework for the identification, quantification, and mechanistic interpretation of chimeric RNAs, providing new insights into RNA splicing dynamics, disease mechanisms, and precision therapeutic applications.