Inquiry
Multi-Omics: How to Integrative Analysis of Exome and Transcriptome Sequencing

Multi-Omics: How to Integrative Analysis of Exome and Transcriptome Sequencing

Online Inquiry

Introduction to Exons

Exons constitute the primary protein-coding sequences within eukaryotic genes. During transcription, exons are spliced together through the RNA splicing process to form mature mRNA, which is subsequently translated into proteins, thereby achieving the ultimate goal of gene expression. The exome refers to the collection of all exon regions within the genome. The transcriptome encompasses the entirety of RNA molecules, or specifically mRNA molecules, present in a particular tissue or cell type, reflecting gene expression levels and the structural diversity of transcripts.

The human genome contains approximately 20,000 protein-coding genes, which collectively comprise around 180,000 exons with a total length of approximately 30 megabases (Mb). These exons represent the fundamental genetic units responsible for protein synthesis in human cells. Although the human genome also includes numerous non-coding RNAs and other gene types, protein-coding genes are among the most critical components of the genome. Consequently, despite exons constituting only 1-2% of the genome, approximately 85% of pathogenic gene variants are located within exon regions.

Figure 1. Exon-Intron Structure: Illustration depicting the structure of exons and introns in a gene.Figure 1.Exon-Intron Structure (D. Kulp et al,. 2003)

Whole Exome Sequencing and Transcriptome Sequencing

Whole Exome Sequencing (WES) involves the enrichment of DNA from the exonic regions of the genome using sequence capture or targeted techniques, followed by high-throughput sequencing. Transcriptome sequencing (RNA-Seq) typically entails high-throughput sequencing of mRNA molecules from specific organisms or tissues. Compared to whole genome sequencing, WES provides greater specificity, reducing sequencing time and cost while achieving higher sequencing depth for the protein-coding regions of the genome. This increased depth enhances the accuracy of genotyping and facilitates the identification of rare mutations associated with diseases, making WES suitable for large-scale studies of complex genetic disorders and cancer cohorts.

In contrast to genome-targeted sequencing techniques, RNA-Seq is extensively employed to assess gene expression levels, identify alternative splicing events, investigate gene regulatory networks, and discover differentially expressed genes.

Figure 2. WES and RNA-Seq Coverage: Visualization showing coverage, variants, and diagnostic yield of sequencing.Figure 2.Coverage, identified variants and diagnostic rate of whole‐exome sequencing, and RNA sequencing. (Fatemeh Peymani et al,. 2022)

Integration of Whole Exome Sequencing and Transcriptome Sequencing

While WES and RNA-Seq individually offer valuable insights, their standalone use presents limitations in elucidating the pathogenic mechanisms underlying diseases such as cancer. WES provides comprehensive data on protein-coding gene variants, yet it lacks information on gene expression levels and regulatory mechanisms. Conversely, RNA-Seq excels in profiling gene expression and identifying regulatory networks but does not capture the full extent of genetic variation.

The integrative analysis of WES and RNA-Seq overcomes these limitations by combining their strengths, thereby providing a more holistic and accurate understanding of biological systems. This approach facilitates the investigation of the intricate relationships between the genome and the transcriptome, elucidating the complex mechanisms of gene function and regulation. Consequently, such integrative analyses are invaluable for advancing research in life sciences, enhancing medical diagnostics, informing clinical treatments, and guiding pharmaceutical development.

Advantages and Methodologies of Integrative Analysis of Whole Exome Sequencing and Transcriptome Sequencing

WES and RNA-Seq are pivotal tools for investigating protein-coding genes within an organism, focusing respectively on identifying coding region variants and analyzing changes in gene expression levels. The integrative analysis of WES and RNA-Seq data offers several advantages for scientific research and clinical applications, as outlined below:

  1. Expanded Detection of Genomic Variants: Integrative analysis extends the scope of genomic variant detection, facilitating the discovery of additional mutation sites and enabling cross-validation of variant accuracy.
  2. Identification of RNA-Editing Variants: This approach allows for the detection of RNA-editing variants, which aids in characterizing RNA single-base substitution (SBS) mutational signatures.
  3. Comprehensive Mutation Landscape: By combining the variant types identified through WES (such as single nucleotide variants [SNVs], insertions and deletions [InDels], structural variants [SVs], and copy number variations [CNVs]) with gene fusion events detected via RNA-Seq, a more comprehensive mutation landscape is revealed.
  4. Allele-Specific Expression and Copy Number Variation Association: Integrative analysis facilitates the detection of allele-specific expression and its correlation with copy number variations.

Figure 3. Integrative Analysis Methodologies: Schematic outlining the approach for integrating WES and transcriptome sequencing data.Figure 3.Methodologies of Integrative Analysis of Whole Exome Sequencing and Transcriptome Sequencing

Case Study of Integrated Analysis of Whole Exome Sequencing and Transcriptome Sequencing

Case (I)

Integrated mutational landscape analysis of uterine leiomyosarcomas

Research Findings

Somatic Mutations: WES revealed 4,827 SNVs and 489 small InDels in uterine leiomyosarcomas.

Copy Number Variations: The study identified CNVs in genes related to epithelial-mesenchymal transition (EMT) and the telomerase reverse transcriptase (TERT) gene.

Gene Fusions: RNA-Seq analysis identified 833 gene fusions across 37 uterine leiomyosarcoma samples, with an average of 22.5 fusions per sample. Notably, ACTG2-ALK and KAT6B-KANSL1 fusions were prominently observed. The ACTG2-ALK fusion involves the ACTG2 promoter, which is highly expressed in smooth muscle cells, and the protein tyrosine kinase domain ALK, potentially driving elevated tyrosine kinase expression in the myometrium.

Advantages of Integrative Analysis

The exclusive use of exome sequencing enables the identification of SNVs, InDels, simple structural variants, and CNVs in driver genes. In contrast, RNA-Seq facilitates the detection of gene fusion events in tumor driver genes, complementing the data obtained from exome and whole genome sequencing. This integrative approach thus provides a more comprehensive view of the mutational landscape in cancer, enhancing the understanding of oncogenic mechanisms and aiding in the identification of potential therapeutic targets.

Case Study (II)

Genomic profiling of sézary syndrome identifies alterations of key T cell signaling and differentiation genes

Research Findings

Somatic Mutations: In the discovery cohort (n = 37), WES of tumors and paired fibroblasts identified 4,738 somatic mutations. Cross-validation revealed that 1,090 of these mutations were also present in the mRNA detected by RNA-Seq.

Gene Expression and Copy Number Variations: Expression of ZEB1 was proportional to its copy number, showing significant differences between the deletion type and the wild-type dual-copy. Additionally, downregulation of ARID1A and RPS6KA1 expression was significantly associated with copy number loss.

Differential Expression Analysis: Differential expression analysis identified 345 upregulated genes in the diseased cohort. Gene enrichment analysis indicated that these genes were primarily involved in cell cycle regulation and immune system pathways.

Advantages of Integrative Analysis

The integration of genomic variation with gene expression levels, along with the cross-validation of genetic mutations using RNA-Seq data, enables a comprehensive elucidation of the genetic and signaling pathways associated with the disease. This approach provides critical molecular insights, supporting the development of targeted therapeutic strategies.

Case Study (III)

Transcriptome-based molecular subtypes and differentiation hierarchies improve the classification framework of acute myeloid leukemia

Research Findings

Identification of AML Molecular Subtypes: Utilizing transcriptomic data from the largest multi-center acute myeloid leukemia (AML) cohort in China, eight molecular subtypes of AML (G1-G8) based on gene expression were established using machine learning algorithms. This includes two previously unrecognized subtypes and three redefined subtypes.

Genomic Alterations in AML: Multi-omics data analysis elucidated the frequency of gene mutations and gene fusions within AML.

Predictive Model Development: The developed predictive models demonstrated robust stability of the eight gene expression subtypes in external cohorts, indicating distinct prognostic outcomes and drug sensitivities.

Advantages of Integrative Analysis

The clinical course and therapeutic responses of many cancers exhibit significant heterogeneity. Current disease classification predominantly relies on genomic-level genetic variations. Transcriptomic data supplements and enriches the existing genome-based classification frameworks, facilitating more precise identification of clinically and biologically relevant molecular subtypes. This integrative approach may provide novel insights into the pathogenesis of the disease.

* For Research Use Only. Not for use in diagnostic procedures.
Online Inquiry