CD Genomics is a bioinformatics data analysis provider. Our team is experienced in Nanopore direct RNA sequencing and our high-quality data analysis platform will be used to generate high-quality analysis results in a fast analysis cycle.

Introduction

Nanopore technology is the only available sequencing technology that can sequence RNA directly, rather than depending on reverse transcription and PCR. This approach has many potential advantages over other RNA-seq strategies, and it allows for direct detection of base modifications such as methylation, discovery and characterization of poly(A )RNA molecules and the study of splice variants (Garalde et al., 2018).

Fig.1 Direct RNA-seq (Garalde et al., 2018).

Application Field

Cancer Research
Vaccine and Therapeutic Drug Development
Clinical Research

Bioinformatics Analysis Content

Clean Data Quality Control
Mapping to Reference Genome
Quantification of transcript expression
The transcriptional structure of genes analysis
Differential gene/transcription isoform quantification
Functional Annotation, Enrichment Analysis and Protein Interaction Network
Methylation Analysis
Methylation site identification
Methylation distribution analysis
Methylation motif analysis
Methylated gene region annotation
Analysis of differentially methylated sites (m5C/ m6A)
Poly(A) tail length estimation
Poly(A) length analysis
Differential Poly(A) length analysis
Correlation analysis of Poly(A) length and transcript expression

How It Works

CD Genomics is a professional bioinformatics service provider with years of experience in NGS and long read sequencing (Oxford Nanopore platforms) data analysis, integrated analysis services, database construction and other bioinformatics solutions.

Table 1 Partial software and database list

Software or database	Versions	Uses	Link
NanoFilt	2.8.0	TGS data filtering	https://github.com/wdecoster/nanofilt
minimap2	2.17	Mapping	https://github.com/lh3/minimap2
samtools	1.11	Sorting	https://github.com/samtools/samtools
seqkit	0.12.0	FASTA/Q tool	https://github.com/shenwei356/seqkit
StringTie	2.1.4	Reconstruct transcripts	http://ccb.jhu.edu/software/stringtie/
gffcompare	0.12.1	Discovery of new transcripts	http://ccb.jhu.edu/software/stringtie/gffcompare.shtml
SUPPA2	2.3	Variable splicing analysis	https://github.com/comprna/SUPPA/
Tapas	2018.5.26	APA analysis	https://github.com/arefeen/TAPAS
Nanopollish	0.13.2	Poly(A) tail analysis	https://github.com/jts/nanopolish
Tombo	1.5.1	Methylation m5C analysis	https://github.com/nanoporetech/tombo

1. What is the minimum starting amount of RNA required for Direct RNA library construction?

Quality qualified total RNA 40-80ug, concentration ≥180 ng/μL.

2. What is the approximate yield of Direct RNA per cell?

Because there is no PCR amplification process for Direct RNA library construction and sequencing, the amount of full-length transcriptome data is relatively low compared with PCR cDNA, and the amount of high-quality total RNA is not less than 1Gb.

Data Control

Quality control was performed on the ONT raw sequencing data, and according to the sequencing data quality value (the threshold value is 7 by default), those greater than the threshold value are PASS and those less than the threshold value are FAIL, and then the following distribution was drawn according to the length of the reads:

Fig.1 Sequencing data reads length and number of reads distribution.

NanoFilt (version: 2.8.0; Parameters: -q 7 -l 50) Raw fastq data were filtered to obtain valid data for subsequent analysis using SeqKit (version: 0.12.0; Parameters: default) statistical data, three generations of effective data statistics are shown in the table below:

Table 1 Quantity information statistics of the TGS data

Sample	Type	TotalBase	TotalReads	MaxLen	AvgLen	N50	L50	N90	L90	meanQ
CD19-CRE1	raw	3,336,362,417	3,285,762	72,523	1,015.39	1,359	809,978	546	2,342,366	9.45
CD19-CRE1	clean	3,204,364,677	2,996,246	13,266	1,069.45	1,372	768,523	560	2,206,451	9.84

Transcript Expression Level Analysis

Using transcriptome data to detect gene expression has high sensitivity. Boxplot and density plot of all transcripts are used to compare the expression of different samples.

Boxplot and density plot of all transcripts 1

Boxplot and density plot of all transcripts 2

Fig.2 Boxplot and density plot of all transcripts.

Methylation Analysis

Methylation is a very important modification in nucleic acids and proteins, regulates gene expression and shutdown, is closely related to many diseases such as cancer, and is one of the important studies in epigenetics. Tombo (version: 1.5) is a suite of tools available from nanopore website for the identification of nucleotide modifications from raw nanopore sequencing data. Using Tombo, m5C modification sites in RNA sequences can be predicted. MINES pipeline (https://github.com/YeoLab/MINES) can predict m6A modified RNA in a sequence of site.

Table 2 m5C results (CD19-WT1)

Trans	Pos	Depth	Sample	Fraction
ENSMUSG00000000028.t1	739	12	CD19-WT1	0.1667
ENSMUSG00000000028.t1	740	16	CD19-WT1	0.25
ENSMUSG00000000028.t1	761	13	CD19-WT1	0.0769
ENSMUSG00000000028.t1	766	19	CD19-WT1	0.3158
ENSMUSG00000000028.t1	770	12	CD19-WT1	0.0833
ENSMUSG00000000028.t1	776	11	CD19-WT1	0
ENSMUSG00000000028.t1	785	17	CD19-WT1	0
ENSMUSG00000000028.t1	802	11	CD19-WT1	0.2727
ENSMUSG00000000028.t1	806	18	CD19-WT1	0.0556
ENSMUSG00000000028.t1	814	15	CD19-WT1	0.2667

Note: Trans is the transcript compared to the reference sequence; Pos is the position; Depth is the effective coverage depth of the methylation site; Sample is the sample name; Fraction is the scoring value of the methylation site.

Fig.3 Chromosome distribution of m5C locus (CD19-WT1)

PolyA Analysis

Transcript poly(A) tails are thought to play a role in post-transcriptional regulation, including mRNA stability and translational efficiency. The poly(A) of the raw data was calculated using NanoPolish (version 0.13.2).

Fig.4 Distribution of poly(A) lengths of different samples.