CD Genomics uses bioinformatics to provide pan-genomics sequencing data analysis service and help you explore the potential code of species. Our unique skills in data analysis can meet customers' personalized data analysis needs and provide you with comprehensive data analysis.

Introduction of Pan-genome

Pan-genome (or supragenome) is the entire set of genes for all strains within a clade in molecular biology and genetics. The pan-genome includes: the core genome containing genes present in all strains within the clade, the accessory genome containing 'dispensable' genes present in a subset of the strains, and strain-specific genes. The study of the pan-genome is called the pangenomics.

What Is Pan-genomics Sequencing Data Analysis?

The increased interest in pan-genomes and the sequencing of the first plant pan-genome have resulted in more sequencing data that requires analysis using bioinformatics. A number of software have since been developed to help analyze such data. For example, several analyses software were developed to analyze pangenomes: cluster homologous genes, identify SNPs, plot pangenomic profiles, build phylogenetic relationships of orthologous genes / families of strains / isolates, function-based searching, annotation and/or curation, and visualizations.

The Meaning of Pan-genomics Sequencing Data Analysis

The analysis of pangenomes sequencing data can answer three important questions which may help characterize the species:
(i) What is the size of the core genome, in other words how many genes/gene families are present in all the individuals?
(ii) What is the size of the pangenome? How many genes/gene families are present within the species?
(iii) With the addition of each new individual, how many genes/gene families will be added to the pangenome? The analysis of the core genome size, the pangenome size and the number of new genes added can be conducted on two levels: individual genes and whole gene families.

In a word, through the construction of pan-genome map, all the genes of this species can be obtained. At the same time, many varieties can be compared and analyzed, and the specific genes and variation sites of varieties can be excavated. These variation sites and endemic genes are often related to the specific biological functions of species.

CD Genomics Pan-genomics Data Analysis Pipeline

Pipeline of pan-genomics sequencing data analysis - CD Genomics

Bioinformatics Analysis Content

Standard analysis content

Genome assembly

Pan-genome build

Assembly evaluation

Genome annotation

Repeat sequence annotation

Gene structure annotation

Gene function annotation

Non-coding RNA annotation

Biological analysis

Comparative genomic analysis

Gene family analysis

Phylogenetic analysis

Positive selection analysis

Collinear analysis

Advanced analysis

Biological analysis

Mining unique genes

Variation detection and analysis based on assembly sequence

Customized Data Analysis

The customized information analysis content can be negotiated and determined according to the needs of customers.

Turnaround Time

About one to two weeks, it’s related to the content of the project and the complexity of the genome. For more information, please contact us.

CD Genomics has over a decade of experience in pan-genomics sequencing data analysis. If you have any questions about how we can help you, please get in touch. We look forward to working with you!


