By incorporating next-generation sequencing (NGS) technology, RNA sequencing (RNA-seq) is the finest tool for mapping and quantifying transcriptomes. The transcriptome is a cell's entire set of transcripts that gives information on the transcript level for a particular developmental stage or physiological condition. Various cDNA fragments are amplified unevenly due to the bias of PCR amplification. During the sequencing process, the effortlessly amplified fragments are dramatically enhanced, and some low-content fragments or fragments with drastic base bias are even completely lost, affecting the accuracy of sequencing results. This only allows us to understand the overall trend of gene expression; it does not allow us to quantify the original gene expression level in absolute terms.
Before library construction, each molecule can be labeled with a UMI (Unique Molecular Identifier), ensuring that each molecule has its own sequence. Preferably, each layout molecule's complex mixture of the UMI and template sequences can be used to identify it. After PCR amplification, PCR copies can be identified and removed from the dataset, virtually eliminating uneven amplification and artifacts generated during the PCR. The results of quantitative statistics obtained through UMI are naturally more accurate. UMI combines NGS with high precision sequence and quantification to study gene expression knowledge about a particular species and tissue in a specific space-time state. This is particularly important for diagnostics and implying small amounts of starting material.
Figure 1. A scheme of digital RNA-Seq. (Shiroguchi, 2012)
The use of RNA-seq technology to measure the transcriptome is known as digital expression (DE). Under various situations and replicates, the number of mapped reads to each transcript or gene varies. To evaluate reads and define significantly expressed transcripts or genes, three different statistical algorithms (edgeR, DESeq, and bayseq) are currently available as R packages. Until now, users have had to manually install and run each R package. Comparing the outcomes of various approaches is also of interest to users.
Digital RNA-sequencing, also known as digital RNA-seq or UMI-RNA-Seq, is an absolute quantitative transcriptome sequencing technique that includes the addition of a unique molecular identifier (UMI) to each cDNA fragment prior to library amplification. The whole fragment amplification, sequencing, and analysis process will be accompanied by UMI. After sequencing, UMI is used to determine the origin of each fragment and merge fragments from the same source (with the same sequence and UMI) to precisely eliminate PCR amplification duplicates and reestablish the sample's original state prior to amplification. Errors in PCR amplification and sequencing can also be corrected using this method.
Some of the advantages of digital RNA sequencing:
1. Low starting amounts: 100ng can achieve the same sequencing results as a 1ug sample, making it more appropriate for rare and valuable samples.
2. UMI technology: Add UMI to each cDNA fragment to eliminate PCR amplification bias, accurately and unbiasedly quantify transcript abundance, and remove PCR amplification bias.
3. Enhance the quality of transcriptome sequencing: Analyze expression levels and differential expression with precision, and lock RNA editing, alternative splicing, and SNP sites.
4. RNA-Seq multiple is appropriate.
Some amplification bias still exists in digital RNA-seq, and barcodes may miss targets during ligation, to name a few drawbacks.