Pan-genomics Sequencing Data Analysis

CD Genomics uses bioinformatics to provide pan-genomics sequencing data analysis service and help you explore the potential code of species. Our unique skills in data analysis can meet customers' personalized data analysis needs and provide you with comprehensive data analysis.

Introduction of Pan-genome

Pan-genome (or supragenome) is the entire set of genes for all strains within a clade in molecular biology and genetics. The pan-genome includes: the core genome containing genes present in all strains within the clade, the accessory genome containing 'dispensable' genes present in a subset of the strains, and strain-specific genes. The study of the pan-genome is called the pangenomics.

What Is Pan-genomics Sequencing Data Analysis?

The increased interest in pan-genomes and the sequencing of the first plant pan-genome have resulted in more sequencing data that requires analysis using bioinformatics. A number of software have since been developed to help analyze such data. For example, several analyses software were developed to analyze pangenomes: cluster homologous genes, identify SNPs, plot pangenomic profiles, build phylogenetic relationships of orthologous genes / families of strains / isolates, function-based searching, annotation and/or curation, and visualizations.

Fig 1. Different approaches to pangenome assembly. (Agnieszka A. G; et al. 2016)

We Can Help Our Clients With

The analysis of pangenomes sequencing data can answer three important questions which may help characterize the species:
(i) What is the size of the core genome, in other words how many genes/gene families are present in all the individuals?
(ii) What is the size of the pangenome? How many genes/gene families are present within the species?
(iii) With the addition of each new individual, how many genes/gene families will be added to the pangenome? The analysis of the core genome size, the pangenome size and the number of new genes added can be conducted on two levels: individual genes and whole gene families.

In a word, through the construction of pan-genome map, all the genes of this species can be obtained. At the same time, many varieties can be compared and analyzed, and the specific genes and variation sites of varieties can be excavated. These variation sites and endemic genes are often related to the specific biological functions of species.

CD Genomics Pan-genomics Data Analysis Pipeline

Pipeline of pan-genomics sequencing data analysis - CD Genomics

What We Offer

Standard analysis content

Genome assembly	Pan-genome build
Genome assembly	Assembly evaluation
Genome annotation	Repeat sequence annotation
	Gene structure annotation
	Gene function annotation
	Non-coding RNA annotation
Biological analysis	Comparative genomic analysis
	Gene family analysis
	Phylogenetic analysis
	Positive selection analysis
	Collinear analysis

Advanced analysis

Biological analysis	Mining unique genes
	Variation detection and analysis based on assembly sequence

Customized Data Analysis

The customized information analysis content can be negotiated and determined according to the needs of customers.

How It Works

CD Genomics is a high-tech company specializing in multiomic data analysis. We provide services such as project design, data analysis, and database construction. With a focus on developing breakthrough products and services, we are a pioneer in the biotechnology industry, serving researchers and partners worldwide.

How It Works

CD Genomics has over a decade of experience in pan-genomics sequencing data analysis. If you have any questions about how we can help you, please get in touch. We look forward to working with you!

Reference

Agnieszka A. G.; et al. Towards plant pangenomics. Plant Biotechnology Journal (2016) 14, pp. 1099–1105.

* For Research Use Only. Not for use in diagnostic procedures.