Inquiry
Nanopore Direct RNA Sequencing Analysis

Nanopore Direct RNA Sequencing Analysis

Online Inquiry

CD Genomics is a bioinformatics data analysis provider. Our team is experienced in Nanopore direct RNA sequencing and our high-quality data analysis platform will be used to generate high-quality analysis results in a fast analysis cycle.

Introduction

Nanopore technology is the only available sequencing technology that can sequence RNA directly, rather than depending on reverse transcription and PCR. This approach has many potential advantages over other RNA-seq strategies, and it allows for direct detection of base modifications such as methylation, discovery and characterization of poly(A )RNA molecules and the study of splice variants (Garalde et al., 2018).

Direct RNA-seqFig.1 Direct RNA-seq (Garalde et al., 2018).

Application Field

  • Cancer Research
  • Vaccine and Therapeutic Drug Development
  • Clinical Research

Bioinformatics Analysis Content

  • Clean Data Quality Control
  • Mapping to Reference Genome
  • Quantification of transcript expression
  • The transcriptional structure of genes analysis
  • Differential gene/transcription isoform quantification
  • Functional Annotation, Enrichment Analysis and Protein Interaction Network
  • Methylation Analysis
    Methylation site identification
    Methylation distribution analysis
    Methylation motif analysis
    Methylated gene region annotation
    Analysis of differentially methylated sites (m5C/ m6A)
  • Poly(A) tail length estimation
    Poly(A) length analysis
    Differential Poly(A) length analysis
    Correlation analysis of Poly(A) length and transcript expression

How It Works

CD Genomics is a professional bioinformatics service provider with years of experience in NGS and long read sequencing (Oxford Nanopore platforms) data analysis, integrated analysis services, database construction and other bioinformatics solutions.

Table 1 Partial software and database list

Software or database Versions Uses Link
NanoFilt 2.8.0 TGS data filtering https://github.com/wdecoster/nanofilt
minimap2 2.17 Mapping https://github.com/lh3/minimap2
samtools 1.11 Sorting https://github.com/samtools/samtools
seqkit 0.12.0 FASTA/Q tool https://github.com/shenwei356/seqkit
StringTie 2.1.4 Reconstruct transcripts http://ccb.jhu.edu/software/stringtie/
gffcompare 0.12.1 Discovery of new transcripts http://ccb.jhu.edu/software/stringtie/gffcompare.shtml
SUPPA2 2.3 Variable splicing analysis https://github.com/comprna/SUPPA/
Tapas 2018.5.26 APA analysis https://github.com/arefeen/TAPAS
Nanopollish 0.13.2 Poly(A) tail analysis https://github.com/jts/nanopolish
Tombo 1.5.1 Methylation m5C analysis https://github.com/nanoporetech/tombo

1. What is the minimum starting amount of RNA required for Direct RNA library construction?

Quality qualified total RNA 40-80ug, concentration ≥180 ng/μL.

2. What is the approximate yield of Direct RNA per cell?

Because there is no PCR amplification process for Direct RNA library construction and sequencing, the amount of full-length transcriptome data is relatively low compared with PCR cDNA, and the amount of high-quality total RNA is not less than 1Gb.

Data Control

Quality control was performed on the ONT raw sequencing data, and according to the sequencing data quality value (the threshold value is 7 by default), those greater than the threshold value are PASS and those less than the threshold value are FAIL, and then the following distribution was drawn according to the length of the reads:

Sequencing data reads length and number of reads distributionFig.1 Sequencing data reads length and number of reads distribution.

NanoFilt (version: 2.8.0; Parameters: -q 7 -l 50) Raw fastq data were filtered to obtain valid data for subsequent analysis using SeqKit (version: 0.12.0; Parameters: default) statistical data, three generations of effective data statistics are shown in the table below:

Table 1 Quantity information statistics of the TGS data

Sample Type TotalBase TotalReads MaxLen AvgLen N50 L50 N90 L90 meanQ
CD19-CRE1 raw 3,336,362,417 3,285,762 72,523 1,015.39 1,359 809,978 546 2,342,366 9.45
CD19-CRE1 clean 3,204,364,677 2,996,246 13,266 1,069.45 1,372 768,523 560 2,206,451 9.84

Transcript Expression Level Analysis

Using transcriptome data to detect gene expression has high sensitivity. Boxplot and density plot of all transcripts are used to compare the expression of different samples.

Boxplot and density plot of all transcripts 1

Boxplot and density plot of all transcripts 2

Fig.2 Boxplot and density plot of all transcripts.

Methylation Analysis

Methylation is a very important modification in nucleic acids and proteins, regulates gene expression and shutdown, is closely related to many diseases such as cancer, and is one of the important studies in epigenetics. Tombo (version: 1.5) is a suite of tools available from nanopore website for the identification of nucleotide modifications from raw nanopore sequencing data. Using Tombo, m5C modification sites in RNA sequences can be predicted. MINES pipeline (https://github.com/YeoLab/MINES) can predict m6A modified RNA in a sequence of site.

Table 2 m5C results (CD19-WT1)

Trans Pos Depth Sample Fraction
ENSMUSG00000000028.t1 739 12 CD19-WT1 0.1667
ENSMUSG00000000028.t1 740 16 CD19-WT1 0.25
ENSMUSG00000000028.t1 761 13 CD19-WT1 0.0769
ENSMUSG00000000028.t1 766 19 CD19-WT1 0.3158
ENSMUSG00000000028.t1 770 12 CD19-WT1 0.0833
ENSMUSG00000000028.t1 776 11 CD19-WT1 0
ENSMUSG00000000028.t1 785 17 CD19-WT1 0
ENSMUSG00000000028.t1 802 11 CD19-WT1 0.2727
ENSMUSG00000000028.t1 806 18 CD19-WT1 0.0556
ENSMUSG00000000028.t1 814 15 CD19-WT1 0.2667

Note: Trans is the transcript compared to the reference sequence; Pos is the position; Depth is the effective coverage depth of the methylation site; Sample is the sample name; Fraction is the scoring value of the methylation site.

Chromosome distribution of m5C locus (CD19-WT1)Fig.3 Chromosome distribution of m5C locus (CD19-WT1)

PolyA Analysis

Transcript poly(A) tails are thought to play a role in post-transcriptional regulation, including mRNA stability and translational efficiency. The poly(A) of the raw data was calculated using NanoPolish (version 0.13.2).

Distribution of poly(A) lengths of different samplesFig.4 Distribution of poly(A) lengths of different samples.

Table 2 m5C results (CD19-WT1)

Trans_id PolyA Expression
ENSMUSG00000000085.t1 98.63 0.346703
ENSMUSG00000000131.t2 302.78 0.048693
ENSMUSG00000000131.t3 260.875 0.0939
ENSMUSG00000000244.t1 192.41 0
ENSMUSG00000000244.t2 402 0.157583
ENSMUSG00000000244.t4 344.25 0.278575
ENSMUSG00000000248.t1 102.93 0.070738
ENSMUSG00000000290.t1 234.53 0.058894
ENSMUSG00000000290.t2 102.59 0.161994
ENSMUSG00000000317.t1 278.21 0.319343

Distribution of polyA length versus expression in samplesFig.5 Distribution of polyA length versus expression in samples.

References

  1. Garalde, D. R., Snell, E. A., Jachimowicz, D., Sipos, B., Lloyd, J. H., Bruce, M., . . . Turner, D. J. (2018). Highly parallel direct RNA sequencing on an array of nanopores. Nat Methods, 15(3), 201-206. doi:10.1038/nmeth.4577
  2. Workman, R. E., Tang, A. D., Tang, P. S., Jain, M., Tyson, J. R., Razaghi, R., . . . Timp, W. (2019). Nanopore native RNA sequencing of a human poly(A) transcriptome. Nat Methods, 16(12), 1297-1305. doi:10.1038/s41592-019-0617-2
* For Research Use Only. Not for use in diagnostic procedures.
Online Inquiry