Inquiry
Ribosome profiling (Ribo-seq) Analysis

Ribosome profiling (Ribo-seq) Analysis

Online Inquiry

CD Genomics is a bioinformatics data analysis provider. Our team is experienced Ribo-seq Data Analysis and our high-quality data analysis platform will be used to generate high-quality analysis results in a fast analysis cycle.

Introduction

Ribosome profiling sequencing (Ribo-seq) is a technology based on high-throughput sequencing to detect genome-wide RNA translation (Ingolia, Ghaemmaghami, Newman, & Weissman, 2009). Ribo-seq is also the current mainstream method for the study of translation of RNA to protein. The specific method is to treat ribosomal-nascent peptide complexes with low concentration of RNase, degrade the RNA fragments without ribosomal coverage, and then remove the ribosome. Finally, a small fragment of about ~30 bp of translating RNA protected by ribosomes was detected by next-generation sequencing technology. These protected RNA fragments accurately indicate the "footprints" of the ribosome in translation. Therefore, these protected RNA fragments are also called ribosome footprints (RFs).

Ribosome profiling sequencing (Ribo-seq) workflowFig. 1 Ribosome profiling sequencing (Ribo-seq) workflow

Application Field

Medicine: disease mechanism research, disease marker discovery, drug target screening

Plants: stress resistance mechanism, growth and development mechanism, breeding protection research, etc.

Animal husbandry: quality research, animal nutrition, breed breeding, etc.

Food environment: storage and processing conditions optimization, quality identification, food nutrition

CD Genomics Data Analysis Pipeline

CD Genomics Data Analysis Pipeline

Bioinformatics Analysis Content

  • Ribo-seq data analysis
    Reads filtering
    Reference genome alignment
    Three nucleotide periodicity analysis
    Alignment with Codon distribution analysis
    Pause sites analysis
    Quantification of gene abundance
  • Sample relationship analysis
    Correlation Analysis of Replicas
    Principal Component Analysis
  • Differentially translated genes (DTGs) analysis
  • GO Enrichment Analysis
  • ORF identification
  • Translation quantification for ORF
  • Differentially translated (DT) ORFs analysis
  • Evaluation of Coding potential of non-canonical ORFs
  • Sequence features analysis
  • The influence of the uORFs upon the mORFs
  • sORF annotation

How It Works

How It Works

Table 1 Partial software and database list

Software or database Uses
fastp Low quality Reads filtering
STAR Reference genome alignment
riboWaltz R package Three nucleotide periodicity analysis
PausePred Pause sites analysis
RSEM Quantification of gene abundance

1. What is ORF?

The key to Ribo-seq is to find ORF (Open Reading Frame), just like RNA-seq to identify mRNA/IncRNA & circRNA. An ORF is a continuous stretch of codons that begins with a start codon (usually AUG) and ends at a stop codon (usually UAA, UAG or UGA).

2. What are the difficulties of ribosomal sequencing?

  • The experimental rRNA residue is high
  • The short Ribosome footprints makes it difficult to identify the real ORF

3. Scientific questions to which Ribo-seq addresses.

  • First, it can be applied to the mechanism of translation inquiry. Attention should be paid to which of the genes being translated are being translated efficiently, and which elements regulate the efficiency of translation.
  • Second, transcriptome analysis can be deepened. Focus on which of the differentially expressed genes are being translated and how efficiently they are translated. Narrow down the range of differentially expressed genes and focus only on genes that are being translated.
  • Third, it can be used to explain the inconsistency between transcriptome and proteome results. When gene transcription is not translated, or translated inefficiently, it may occur that the transcriptome shows a difference or significant difference, while the proteome shows no difference or insignificant difference.

Quality Control

The raw data of the sequencing is filtered through a series of filtration methods to obtain high quality sequencing data for subsequent analysis.

Table 1 Reads filter information statistics table

Sample Raw Reads Num Clean Reads Num (%) Read length Adapter (%) low quality (%) polyA (%) N (%)
T1 49132379 48934325 (99.6%) 150/150+150/150 94166 (0.38%) 9617 (0.02%) 0 (0%) 106 (0.0%)
T2 51508605 51284781 (99.57%) 150/150+150/150 105050 (0.41%) 13652 (0.03%) 0 (0%) 71 (0.0%)

Table 2 Statistical table of filtered base information

Sample Raw Data(bp)_before Q20 (%) Q30 (%) N (%) GC (%)
T1 7369856850 1451146305 (98.57%) 1414372834 (96.08%) 45774 (0.0%) 745364428 (50.63%)
T2 7726290750 1505703187 (98.46%) 1464924432 (95.8%) 43089 (0.0%) 746397174 (48.81%)

RFs distribution statistics

According to the reads length distribution statistics, we only retained reads with length between 20 bp and 40 bp, and we will use these data that meet the expected length for subsequent analysis.

Plot of RFs length distributionFig.1 Plot of RFs length distribution

Based on the alignment position of RFs on coding genes, we classified RFs into four categories: CDS, 5'UTR, 3'UTR, and Intron. In general, RFs were mostly distributed in the CDS region and less in the UTR region.

Map of the location distribution of RFs coding genesFig.2 Map of the location distribution of RFs coding genes

sORF identification

Identifying coding elements is very important work in genomic studies. In common genome annotation pipelines, only proteins longer than 100 amino acids are generally concerned. The Coding sequences of these common coding genes are also called consensus coding sequences (CCDS), and the corresponding open reading frames are called CCDS ORF. In general, these ORF are main protein-coding ORF, which are collectively referred to as mORF in this concluding report. However, recent studies have shown that some traditional RNA regions (including lncRNA, 5'UTR and 3'UTR) that do not encode proteins can actually translate some peptides, which are usually less than 100 amino acids in length. small peptides (less than 100 amino acids in length) also play diverse roles in organisms, including ontogeny, muscle contraction and DNA repair. The small ORF encoding these short peptides are typically less than 300nt in length and will be collectively referred to as sORF in this report. According to the source region of sORF, we classify them, and the classification rules are as follows:(a) sORF derived from the 5'UTR region of the known coding gene, designated as uORF; (b) sORF derived from 3'UTR region of known coding gene, named dORF;

Table 3 sORF identification statistics

  uORF dORF
Number 73557 190578
Average length 45 38

Violin plot of sORF expression distributionFig.3 Violin plot of sORF expression distribution

Screening of potentially translatable sORF

ORFscore and RRS were calculated based on the abundance and position distribution of each sORF, Fickett score and Hexamer score were calculated based on the sequence characteristics of sORF, and four values were integrated to screen the possible translated sORF.

Distribution of score values for RRS and ORFscoreFig.4 Distribution of score values for RRS and ORFscore

Potentially translatable sORF Venn diagramFig.5 Potentially translatable sORF Venn diagram

Reference

  1. Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. S., & Weissman, J. S. (2009). Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling. Science, 324(5924), 218-223. doi:10.1126/science.1168978
* For Research Use Only. Not for use in diagnostic procedures.
Online Inquiry