Home
Resources
Support Documents
Bioinformatics 101: circRNA Sequencing Data Analysis

Bioinformatics 101: circRNA Sequencing Data Analysis

What is circular RNA?

Circular RNAs (circRNAs) are a unique class of RNA molecules that have gained significant attention in the field of molecular biology. Unlike traditional linear RNA molecules, circRNAs form closed-loop structures, making them resistant to degradation by exonucleases. While initially considered as byproducts of splicing errors, circRNAs have emerged as key players in gene regulation and various biological processes. They have been implicated in crucial cellular functions such as RNA stability, transcriptional regulation, and protein interaction. The study of circRNAs and their expression patterns requires comprehensive analysis, which is where bioinformatics plays a crucial role.

Circle RNA sequencing data analysis provides a systematic approach to extracting meaningful information from large-scale sequencing datasets. It enables the identification of differentially expressed circRNAs, the exploration of their potential functions, and the elucidation of regulatory networks they participate in. Bioinformatics tools allow researchers to process, analyze, and interpret large-scale circRNA sequencing data efficiently. Let's explore the step-by-step guide to circRNA expression data analysis using bioinformatics techniques.

Circular RNAs Prediction

By harnessing the power of bioinformatics and machine learning, we can develop sophisticated models and algorithms that aid in the prediction of circRNAs from genomic and transcriptomic data.

One of the key features of circRNAs is the presence of back-splice sites, where the 5' end of an upstream exon is covalently linked to the 3' end of a downstream exon. Predictive approaches utilize sequence motifs and secondary structure properties to identify potential back-splice sites. Machine learning algorithms, such as support vector machines (SVMs) and random forests, can be trained on known circRNA datasets to recognize the characteristic patterns associated with back-splice sites.

Conservation analysis provides insights into the evolutionary history and functional significance of circRNAs. By comparing genomic sequences across species, researchers can identify conserved circRNAs that may have important roles across different organisms. Phylogenetic footprinting and analysis of evolutionary signatures enable the identification of conserved structural elements within circRNAs, further aiding in their prediction.

The circularization of RNA molecules involves specific splicing events. Analyzing splicing signals associated with circRNA formation can provide valuable information for prediction. Computational tools can be employed to identify canonical and non-canonical splicing signals that promote circularization. By considering the presence and distribution of these signals, predictive models can be trained to distinguish potential circRNA-forming exons from linear splicing events.

Linear vs Circular splicing. (Dori et al., 2019)

circRNA Sequencing Data Analysis Pipeline

Quality Control of circRNA Expression Data

Before proceeding with circRNA analysis, it is essential to perform quality control on the sequencing data. This step ensures that the subsequent analyses are based on reliable and accurate information. Quality control involves assessing sequencing quality scores, removing low-quality bases, and eliminating adapter sequences introduced during library preparation. Various software tools, such as FastQC and Trimmomatic, can be employed to perform these quality control steps. Take a look at our Circular RNA Analysis Demo Results.

Mapping circRNA Reads to the Reference Genome

Once the quality control steps are completed, the next crucial step is to map the circRNA sequencing reads to the reference genome. Traditional alignment algorithms designed for linear RNA analysis may not be suitable for circRNA analysis due to their circular nature. Specialized alignment algorithms, such as CIRI2, CIRCexplorer2, and Segemehl, are employed to accurately identify the genomic locations from which circRNA reads originate. These algorithms consider the circular structure of circRNAs and differentiate them from linear RNA molecules.

Computational Tools for circRNA Identification

Identifying circRNAs from the aligned sequencing data is a critical step in circRNA expression analysis. Several computational tools have been developed to detect and annotate circRNAs accurately. These tools utilize unique features of circRNAs, such as back-splicing junction sites, to identify circRNAs in sequencing data. Examples of widely used circRNA identification tools include CIRI, find_circ, and CIRCexplorer2.

Quantification of circRNA Expression

Quantifying the expression levels of circRNAs is essential to understand their abundance and differential expression across different samples. There are two main approaches for circRNA quantification: read count-based methods and unique mapping-based methods. Read count-based methods involve counting the number of sequencing reads that align to circRNAs. Tools like CIRI-quant, CIRCexplorer2, and DESeq2 employ this approach. On the other hand, unique mapping-based methods focus on reads that map uniquely to circRNA junction sites, enabling more precise quantification. Tools like Sailfish-CIR, CIRIquant, and CIRCexplorer2 offer unique mapping-based quantification strategies. Researchers can choose the appropriate method based on their specific research objectives and considerations of advantages and limitations.

Identifying Differentially Expressed circRNAs

To identify circRNAs that exhibit significant changes in expression between different conditions or sample groups, statistical methods for differential expression analysis are employed. Popular tools such as edgeR, DESeq2, and Limma provide statistical frameworks to assess differential expression. These methods account for the inherent variability in circRNA expression data and generate statistical measures of significance. By comparing expression levels between conditions, researchers can uncover circRNAs that are potentially involved in specific biological processes or disease states.

Enrichment Analysis of Gene Ontology Terms

Understanding the functional implications of differentially expressed circRNAs requires enrichment analysis of Gene Ontology (GO) terms. GO analysis provides a structured vocabulary to describe biological processes, molecular functions, and cellular components. Tools like DAVID, g: Profiler, and Metascape can be utilized to identify enriched GO terms associated with differentially expressed circRNAs. This analysis helps researchers gain insights into the potential biological functions and pathways in which circRNAs are involved.

Pathway Analysis and miRNA Target Site Prediction

Pathway analysis is a complementary approach to GO analysis, providing information on the signaling pathways and biological processes in which circRNAs may participate. Tools such as KEGG, Reactome, and IPA offer comprehensive pathway analysis capabilities. Additionally, circRNAs can function as competitive endogenous RNAs (ceRNAs) that sequester microRNAs (miRNAs), thereby regulating the activity of miRNA targets. Tools like miRanda, TargetScan, and miRWalk can be employed to predict potential miRNA target sites within circRNAs, uncovering their regulatory interactions.

Tools for Functional Analysis

Bioinformatics provides a range of user-friendly tools for functional analysis, aiding researchers in exploring the biological functions and regulatory networks of circRNAs. These tools integrate various databases, algorithms, and visualization methods to facilitate comprehensive functional interpretation. Examples of widely used tools include circBase, circAtlas, and CircNet. They enable researchers to investigate circRNA-protein interactions, miRNA-mRNA interactions, and potential involvement in specific biological pathways or diseases.

Common circRNA Database

circRNA databases are curated collections of circRNA sequences, annotations, and associated information gathered from various experimental and computational sources. These databases offer an extensive range of features, including circRNA identification, classification, functional annotation, expression analysis, and visualization tools. By collating data from diverse studies, circRNA databases provide a comprehensive resource that facilitates efficient data exploration and analysis.

CircRNA Database	Features	Applications
circBase	- Comprehensive circRNA annotation	- Exploration of circRNA expression profiles
	- Integration of various experimental data	- Functional analysis of circRNA-miRNA interactions
	- miRNA binding site prediction	- Investigation of circRNA functions and regulatory mechanisms
CIRCpedia	- Large collection of circRNA entries	- Identification of differentially expressed circRNAs
	- Expression profiles across multiple tissues	- Functional characterization of circRNAs
	- Disease association information	- Exploration of circRNA-associated diseases
circRNADb	- Genomic location and expression data	- Annotation of circRNA-protein interactions
	- Circularization junction sites	- Prediction of circRNA secondary structures
	- Potential protein-coding analysis	- Investigation of circRNA functions and mechanisms
circ2Traits	- Association of circRNAs with human traits and diseases	- Identification of circRNA biomarkers in human diseases
circ2Traits	- Integration with GWAS data	- Exploration of circRNA-disease associations
circAtlas	- Extensive circRNA annotation and expression data	- Comparative analysis of circRNA expression across species
	- Tissue-specific circRNA profiles	- Exploration of circRNA evolution and conservation
	- Visualization tools for circRNA expression patterns	- Functional analysis of circRNA-associated genes and pathways

CircAtlas (Wu et al., 2020)

Harnessing circRNA Databases for Sequencing Data Analysis:

Sequencing data analysis plays a pivotal role in circRNA research, as it enables researchers to identify differentially expressed circRNAs, investigate circRNA-miRNA interactions, discover novel circRNA isoforms, and elucidate their functional significance. circRNA databases serve as invaluable resources for conducting such analyses by providing user-friendly interfaces, algorithms, and statistical tools specifically designed for circRNA data exploration.

By leveraging the power of circRNA databases, scientists can expedite their research, validate hypotheses, and generate novel insights into the roles and regulatory mechanisms of circRNAs in various biological processes and diseases.

References

Wu, Wanying, Peifeng Ji, and Fangqing Zhao. "CircAtlas: an integrated resource of one million highly accurate circular RNAs from 1070 vertebrate transcriptomes." Genome biology 21.1 (2020): 1-14.
Dori, Martina, and Silvio Bicciato. "Integration of bioinformatic predictions and experimental data to identify circRNA-miRNA associations." Genes 10.9 (2019): 642.

* For Research Use Only. Not for use in diagnostic procedures.