CD Genomics is a bioinformatics data analysis provider. Our team is experienced in Nanopore direct RNA sequencing and our high-quality data analysis platform will be used to generate high-quality analysis results in a fast analysis cycle.
Introduction
Nanopore technology is the only available sequencing technology that can sequence RNA directly, rather than depending on reverse transcription and PCR. This approach has many potential advantages over other RNA-seq strategies, and it allows for direct detection of base modifications such as methylation, discovery and characterization of poly(A )RNA molecules and the study of splice variants (Garalde et al., 2018).
Fig.1 Direct RNA-seq (Garalde et al., 2018).
Application Field
- Cancer Research
- Vaccine and Therapeutic Drug Development
- Clinical Research
Bioinformatics Analysis Content
- Clean Data Quality Control
- Mapping to Reference Genome
- Quantification of transcript expression
- The transcriptional structure of genes analysis
- Differential gene/transcription isoform quantification
- Functional Annotation, Enrichment Analysis and Protein Interaction Network
- Methylation Analysis
Methylation site identification
Methylation distribution analysis
Methylation motif analysis
Methylated gene region annotation
Analysis of differentially methylated sites (m5C/ m6A) - Poly(A) tail length estimation
Poly(A) length analysis
Differential Poly(A) length analysis
Correlation analysis of Poly(A) length and transcript expression
How It Works
CD Genomics is a professional bioinformatics service provider with years of experience in NGS and long read sequencing (Oxford Nanopore platforms) data analysis, integrated analysis services, database construction and other bioinformatics solutions.
Table 1 Partial software and database list
Software or database | Versions | Uses | Link |
NanoFilt | 2.8.0 | TGS data filtering | https://github.com/wdecoster/nanofilt |
minimap2 | 2.17 | Mapping | https://github.com/lh3/minimap2 |
samtools | 1.11 | Sorting | https://github.com/samtools/samtools |
seqkit | 0.12.0 | FASTA/Q tool | https://github.com/shenwei356/seqkit |
StringTie | 2.1.4 | Reconstruct transcripts | http://ccb.jhu.edu/software/stringtie/ |
gffcompare | 0.12.1 | Discovery of new transcripts | http://ccb.jhu.edu/software/stringtie/gffcompare.shtml |
SUPPA2 | 2.3 | Variable splicing analysis | https://github.com/comprna/SUPPA/ |
Tapas | 2018.5.26 | APA analysis | https://github.com/arefeen/TAPAS |
Nanopollish | 0.13.2 | Poly(A) tail analysis | https://github.com/jts/nanopolish |
Tombo | 1.5.1 | Methylation m5C analysis | https://github.com/nanoporetech/tombo |
1. What is the minimum starting amount of RNA required for Direct RNA library construction?
Quality qualified total RNA 40-80ug, concentration ≥180 ng/μL.
2. What is the approximate yield of Direct RNA per cell?
Because there is no PCR amplification process for Direct RNA library construction and sequencing, the amount of full-length transcriptome data is relatively low compared with PCR cDNA, and the amount of high-quality total RNA is not less than 1Gb.
Data Control
Quality control was performed on the ONT raw sequencing data, and according to the sequencing data quality value (the threshold value is 7 by default), those greater than the threshold value are PASS and those less than the threshold value are FAIL, and then the following distribution was drawn according to the length of the reads:
Fig.1 Sequencing data reads length and number of reads distribution.
NanoFilt (version: 2.8.0; Parameters: -q 7 -l 50) Raw fastq data were filtered to obtain valid data for subsequent analysis using SeqKit (version: 0.12.0; Parameters: default) statistical data, three generations of effective data statistics are shown in the table below:
Table 1 Quantity information statistics of the TGS data
Sample | Type | TotalBase | TotalReads | MaxLen | AvgLen | N50 | L50 | N90 | L90 | meanQ |
CD19-CRE1 | raw | 3,336,362,417 | 3,285,762 | 72,523 | 1,015.39 | 1,359 | 809,978 | 546 | 2,342,366 | 9.45 |
CD19-CRE1 | clean | 3,204,364,677 | 2,996,246 | 13,266 | 1,069.45 | 1,372 | 768,523 | 560 | 2,206,451 | 9.84 |
Transcript Expression Level Analysis
Using transcriptome data to detect gene expression has high sensitivity. Boxplot and density plot of all transcripts are used to compare the expression of different samples.
Fig.2 Boxplot and density plot of all transcripts.
Methylation Analysis
Methylation is a very important modification in nucleic acids and proteins, regulates gene expression and shutdown, is closely related to many diseases such as cancer, and is one of the important studies in epigenetics. Tombo (version: 1.5) is a suite of tools available from nanopore website for the identification of nucleotide modifications from raw nanopore sequencing data. Using Tombo, m5C modification sites in RNA sequences can be predicted. MINES pipeline (https://github.com/YeoLab/MINES) can predict m6A modified RNA in a sequence of site.
Table 2 m5C results (CD19-WT1)
Trans | Pos | Depth | Sample | Fraction |
ENSMUSG00000000028.t1 | 739 | 12 | CD19-WT1 | 0.1667 |
ENSMUSG00000000028.t1 | 740 | 16 | CD19-WT1 | 0.25 |
ENSMUSG00000000028.t1 | 761 | 13 | CD19-WT1 | 0.0769 |
ENSMUSG00000000028.t1 | 766 | 19 | CD19-WT1 | 0.3158 |
ENSMUSG00000000028.t1 | 770 | 12 | CD19-WT1 | 0.0833 |
ENSMUSG00000000028.t1 | 776 | 11 | CD19-WT1 | 0 |
ENSMUSG00000000028.t1 | 785 | 17 | CD19-WT1 | 0 |
ENSMUSG00000000028.t1 | 802 | 11 | CD19-WT1 | 0.2727 |
ENSMUSG00000000028.t1 | 806 | 18 | CD19-WT1 | 0.0556 |
ENSMUSG00000000028.t1 | 814 | 15 | CD19-WT1 | 0.2667 |
Note: Trans is the transcript compared to the reference sequence; Pos is the position; Depth is the effective coverage depth of the methylation site; Sample is the sample name; Fraction is the scoring value of the methylation site.
Fig.3 Chromosome distribution of m5C locus (CD19-WT1)
PolyA Analysis
Transcript poly(A) tails are thought to play a role in post-transcriptional regulation, including mRNA stability and translational efficiency. The poly(A) of the raw data was calculated using NanoPolish (version 0.13.2).
Fig.4 Distribution of poly(A) lengths of different samples.
Table 2 m5C results (CD19-WT1)
Trans_id | PolyA | Expression |
ENSMUSG00000000085.t1 | 98.63 | 0.346703 |
ENSMUSG00000000131.t2 | 302.78 | 0.048693 |
ENSMUSG00000000131.t3 | 260.875 | 0.0939 |
ENSMUSG00000000244.t1 | 192.41 | 0 |
ENSMUSG00000000244.t2 | 402 | 0.157583 |
ENSMUSG00000000244.t4 | 344.25 | 0.278575 |
ENSMUSG00000000248.t1 | 102.93 | 0.070738 |
ENSMUSG00000000290.t1 | 234.53 | 0.058894 |
ENSMUSG00000000290.t2 | 102.59 | 0.161994 |
ENSMUSG00000000317.t1 | 278.21 | 0.319343 |
Fig.5 Distribution of polyA length versus expression in samples.