Exons constitute the primary protein-coding sequences within eukaryotic genes. During transcription, exons are spliced together through the RNA splicing process to form mature mRNA, which is subsequently translated into proteins, thereby achieving the ultimate goal of gene expression. The exome refers to the collection of all exon regions within the genome. The transcriptome encompasses the entirety of RNA molecules, or specifically mRNA molecules, present in a particular tissue or cell type, reflecting gene expression levels and the structural diversity of transcripts.
The human genome contains approximately 20,000 protein-coding genes, which collectively comprise around 180,000 exons with a total length of approximately 30 megabases (Mb). These exons represent the fundamental genetic units responsible for protein synthesis in human cells. Although the human genome also includes numerous non-coding RNAs and other gene types, protein-coding genes are among the most critical components of the genome. Consequently, despite exons constituting only 1-2% of the genome, approximately 85% of pathogenic gene variants are located within exon regions.
Figure 1.Exon-Intron Structure (D. Kulp et al,. 2003)
Whole Exome Sequencing (WES) involves the enrichment of DNA from the exonic regions of the genome using sequence capture or targeted techniques, followed by high-throughput sequencing. Transcriptome sequencing (RNA-Seq) typically entails high-throughput sequencing of mRNA molecules from specific organisms or tissues. Compared to whole genome sequencing, WES provides greater specificity, reducing sequencing time and cost while achieving higher sequencing depth for the protein-coding regions of the genome. This increased depth enhances the accuracy of genotyping and facilitates the identification of rare mutations associated with diseases, making WES suitable for large-scale studies of complex genetic disorders and cancer cohorts.
In contrast to genome-targeted sequencing techniques, RNA-Seq is extensively employed to assess gene expression levels, identify alternative splicing events, investigate gene regulatory networks, and discover differentially expressed genes.
Figure 2.Coverage, identified variants and diagnostic rate of whole‐exome sequencing, and RNA sequencing. (Fatemeh Peymani et al,. 2022)
While WES and RNA-Seq individually offer valuable insights, their standalone use presents limitations in elucidating the pathogenic mechanisms underlying diseases such as cancer. WES provides comprehensive data on protein-coding gene variants, yet it lacks information on gene expression levels and regulatory mechanisms. Conversely, RNA-Seq excels in profiling gene expression and identifying regulatory networks but does not capture the full extent of genetic variation.
The integrative analysis of WES and RNA-Seq overcomes these limitations by combining their strengths, thereby providing a more holistic and accurate understanding of biological systems. This approach facilitates the investigation of the intricate relationships between the genome and the transcriptome, elucidating the complex mechanisms of gene function and regulation. Consequently, such integrative analyses are invaluable for advancing research in life sciences, enhancing medical diagnostics, informing clinical treatments, and guiding pharmaceutical development.
Service you may need
WES and RNA-Seq are pivotal tools for investigating protein-coding genes within an organism, focusing respectively on identifying coding region variants and analyzing changes in gene expression levels. The integrative analysis of WES and RNA-Seq data offers several advantages for scientific research and clinical applications, as outlined below:
Figure 3.Methodologies of Integrative Analysis of Whole Exome Sequencing and Transcriptome Sequencing
Integrated mutational landscape analysis of uterine leiomyosarcomas
Research Findings
Somatic Mutations: WES revealed 4,827 SNVs and 489 small InDels in uterine leiomyosarcomas.
Copy Number Variations: The study identified CNVs in genes related to epithelial-mesenchymal transition (EMT) and the telomerase reverse transcriptase (TERT) gene.
Gene Fusions: RNA-Seq analysis identified 833 gene fusions across 37 uterine leiomyosarcoma samples, with an average of 22.5 fusions per sample. Notably, ACTG2-ALK and KAT6B-KANSL1 fusions were prominently observed. The ACTG2-ALK fusion involves the ACTG2 promoter, which is highly expressed in smooth muscle cells, and the protein tyrosine kinase domain ALK, potentially driving elevated tyrosine kinase expression in the myometrium.
Advantages of Integrative Analysis
The exclusive use of exome sequencing enables the identification of SNVs, InDels, simple structural variants, and CNVs in driver genes. In contrast, RNA-Seq facilitates the detection of gene fusion events in tumor driver genes, complementing the data obtained from exome and whole genome sequencing. This integrative approach thus provides a more comprehensive view of the mutational landscape in cancer, enhancing the understanding of oncogenic mechanisms and aiding in the identification of potential therapeutic targets.
Genomic profiling of sézary syndrome identifies alterations of key T cell signaling and differentiation genes
Research Findings
Somatic Mutations: In the discovery cohort (n = 37), WES of tumors and paired fibroblasts identified 4,738 somatic mutations. Cross-validation revealed that 1,090 of these mutations were also present in the mRNA detected by RNA-Seq.
Gene Expression and Copy Number Variations: Expression of ZEB1 was proportional to its copy number, showing significant differences between the deletion type and the wild-type dual-copy. Additionally, downregulation of ARID1A and RPS6KA1 expression was significantly associated with copy number loss.
Differential Expression Analysis: Differential expression analysis identified 345 upregulated genes in the diseased cohort. Gene enrichment analysis indicated that these genes were primarily involved in cell cycle regulation and immune system pathways.
Advantages of Integrative Analysis
The integration of genomic variation with gene expression levels, along with the cross-validation of genetic mutations using RNA-Seq data, enables a comprehensive elucidation of the genetic and signaling pathways associated with the disease. This approach provides critical molecular insights, supporting the development of targeted therapeutic strategies.
Transcriptome-based molecular subtypes and differentiation hierarchies improve the classification framework of acute myeloid leukemia
Research Findings
Identification of AML Molecular Subtypes: Utilizing transcriptomic data from the largest multi-center acute myeloid leukemia (AML) cohort in China, eight molecular subtypes of AML (G1-G8) based on gene expression were established using machine learning algorithms. This includes two previously unrecognized subtypes and three redefined subtypes.
Genomic Alterations in AML: Multi-omics data analysis elucidated the frequency of gene mutations and gene fusions within AML.
Predictive Model Development: The developed predictive models demonstrated robust stability of the eight gene expression subtypes in external cohorts, indicating distinct prognostic outcomes and drug sensitivities.
Advantages of Integrative Analysis
The clinical course and therapeutic responses of many cancers exhibit significant heterogeneity. Current disease classification predominantly relies on genomic-level genetic variations. Transcriptomic data supplements and enriches the existing genome-based classification frameworks, facilitating more precise identification of clinically and biologically relevant molecular subtypes. This integrative approach may provide novel insights into the pathogenesis of the disease.