A Detailed Protocol of LncRNA Data Analysis

A Detailed Protocol of LncRNA Data Analysis

Online Inquiry

Introduction to LncRNA Data Analysis

Long non-coding RNA (LncRNA) is a type of non-coding RNA that has gained a lot of attention in recent years due to its unique regulatory mechanisms. LncRNA has a length of more than 200 nucleotides. It lacks protein-coding and is poorly conserved across species. Some LncRNAs have a sequence structure that is analogous to that of mRNA (containing polyA tails, with variable splicing). Long non-coding RNA can enforce gene expression at the epigenetic, transcription, and post-transcription levels by integrating with DNA, RNA, or protein.

A study methodological workflow for lncRNA analysis. (Affinito, 2020)Figure 1. A study methodological workflow for lncRNA analysis. (Affinito, 2020)

Protocol of LncRNA Data Analysis

Data pre-processing

    1. Quality assessment

    After acquiring the raw data (fastq files), FastQC v0.11.3 is used to assess the quality of the original reads, involving the allocation of sequencing error rates and GC components.

    2. Data filtering

    Low-quality reads and adapter sequences are present in the existing sequencing sequences. Raw reads must be processed to obtain clean reads, and subsequent assessment must be based on clean reads, maintain the quality of data analysis. The deletion of adapter sequences in reads, as well as the deletion of reads with a large percentage of N, are all examples of data filtering (N denotes the unascertained base information), and the deletion of low-quality reads. Cutadapt and Trimmomatic are used in this procedure.

Overall quality assessment of RNA-sequencing

It primarily consists of assessing inter-sample correlation (Pearson correlation coefficient) and evaluating uniform allocation.

Reads alignment to the Reference Genome

For read alignment, the STAR aligner and Tophat 2 are frequently utilized. The mapping findings (total mapped reads or fragments) should usually be greater than 70% if the reference genome is chosen correctly and the experiments are not contaminated.

Data exploration

DESeq2 is used to explore the data after the files have been sorted. The resulting data can be used to perform cluster analysis and PCA (principal component analyses) on RNA-seq specimens, allowing the relationship between specimens to be investigated and the experimental design to be validated. The specimen clustering distance, also known as PCA distance, indicates how similar the samples are.

Transcripts assembly. Cufflinks or Scripture software are used to create the transcripts. Cufflinks employ a probability template to organize and evaluate the isoform set's expression rate as small as possible at the same time, to offer the most likely explanation of expression data at the mapping point, and to appropriately offer chain information with particular parameters for the chain-specific library. Scripture, which uses a statistical segmentation model to differentiate between expression areas and experimental background noise, gives data on all isoforms with statistically substantial expression at the mapping area and can be used to assemble long transcripts.

Candidate lncRNA screening

    1. Basic screening

    The fundamental screening involves three phases: first, transcripts with a length higher than 200bp and a number of exons higher than 2 are chosen; second, Cufflinks calculates the coverage of each transcript, and transcripts with a minimum coverage of reads higher than 3 are chosen; and third, non-lncRNAs are processed out by comparison with known non-lncRNAs, and Cuffmerge's findings are employed for position screening (different class-code is selected for various types of lncRNA).

    2. Coding Potential Evaluation

    The coding potential of lncRNA is the most important factor to consider when evaluating it. Coding Potential Calculator (CPC) analysis, Coding-Non-Coding Index (CNCI) analysis, PFAM protein domain analysis, and PhyloCSF analysis are currently the most common methods for encoding potential analysis.

Expression analysis

Expression rate comparison, differential expression assessment, differential expression lncRNA screening, lncRNA expression clustering algorithms, and tissue or phenotypic specific assessment are some of the most common applications. DESeq or Cuffdiff are commonly used for these analyses.

Advanced analysis

    1. LncRNA target gene prediction

    The mechanism of lncRNA is linked to the gene that codes for a protein. For functional enrichment assessment, protein-coding genes nearby to lncRNA are recognized, and the primary purpose of lncRNA can be anticipated in lncRNATargets.

    2. Functional enrichment analysis of specific lncRNA

    lncRNA with differential expression or tissue or phenotypic specific expression is referred to as particular lncRNA. These lncRNAs can be analyzed for functional enrichment using GO and KEGG, respectively.

    3. Interaction analysis

    LncRNA and mRNA can be linked by their targeting relationships, and mRNA can be linked by protein, resulting in a lncRNA-mRNA-protein interaction network. Cytoscape can be used to conceptualize this interaction.

About CD Genomics Bioinformatics Analysis

The bioinformatics analysis department of CD Genomics provides novel solutions for data-driven innovation aimed at discovering the hidden potential in biological data, tapping new insights related to life science research, and predicting new prospects.


  1. Affinito O, Pane K, Smaldone G, et al. lncRNAs–mRNAs Co–Expression Network Underlying Childhood B–Cell Acute Lymphoblastic Leukaemia: A Pilot Study. Cancers. 2020, 12(9).
  2. Fan CN, Ma L, Liu N. Systematic analysis of lncRNA–miRNA–mRNA competing endogenous RNA network identifies four-lncRNA signature as a prognostic biomarker for breast cancer. Journal of translational medicine. 2018, 16(1).
  3. Jiang Q, Wang J, Wu X, et al. LncRNA2Target: a database for differentially expressed genes after lncRNA knockdown or overexpression. Nucleic acids research. 2015, 43(D1).
  4. Song J, Ye A, Jiang E, et al. Reconstruction and analysis of the aberrant lncRNA‐miRNA‐mRNA network based on competitive endogenous RNA in CESC. Journal of cellular biochemistry. 2018, 119(8).
* For Research Use Only. Not for use in diagnostic procedures.
Online Inquiry