CD Genomics uses bioinformatics to provide pan-genomics sequencing data analysis service and help you explore the potential code of species. Our unique skills in data analysis can meet customers' personalized data analysis needs and provide you with comprehensive data analysis.
Pan-genome (or supragenome) is the entire set of genes for all strains within a clade in molecular biology and genetics. The pan-genome includes: the core genome containing genes present in all strains within the clade, the accessory genome containing 'dispensable' genes present in a subset of the strains, and strain-specific genes. The study of the pan-genome is called the pangenomics.
The increased interest in pan-genomes and the sequencing of the first plant pan-genome have resulted in more sequencing data that requires analysis using bioinformatics. A number of software have since been developed to help analyze such data. For example, several analyses software were developed to analyze pangenomes: cluster homologous genes, identify SNPs, plot pangenomic profiles, build phylogenetic relationships of orthologous genes / families of strains / isolates, function-based searching, annotation and/or curation, and visualizations.
Fig 1. Different approaches to pangenome assembly. (Agnieszka A. G; et al. 2016)
The analysis of pangenomes sequencing data can answer three important questions which may help characterize the species:
(i) What is the size of the core genome, in other words how many genes/gene families are present in all the individuals?
(ii) What is the size of the pangenome? How many genes/gene families are present within the species?
(iii) With the addition of each new individual, how many genes/gene families will be added to the pangenome? The analysis of the core genome size, the pangenome size and the number of new genes added can be conducted on two levels: individual genes and whole gene families.
In a word, through the construction of pan-genome map, all the genes of this species can be obtained. At the same time, many varieties can be compared and analyzed, and the specific genes and variation sites of varieties can be excavated. These variation sites and endemic genes are often related to the specific biological functions of species.
Standard analysis content
Genome assembly |
Pan-genome build |
Assembly evaluation |
|
Genome annotation |
Repeat sequence annotation |
Gene structure annotation |
|
Gene function annotation |
|
Non-coding RNA annotation |
|
Biological analysis |
Comparative genomic analysis |
Gene family analysis |
|
Phylogenetic analysis |
|
Positive selection analysis |
|
Collinear analysis |
Advanced analysis
Biological analysis |
Mining unique genes |
Variation detection and analysis based on assembly sequence |
Customized Data Analysis
The customized information analysis content can be negotiated and determined according to the needs of customers.
CD Genomics is a high-tech company specializing in multiomic data analysis. We provide services such as project design, data analysis, and database construction. With a focus on developing breakthrough products and services, we are a pioneer in the biotechnology industry, serving researchers and partners worldwide.
CD Genomics has over a decade of experience in pan-genomics sequencing data analysis. If you have any questions about how we can help you, please get in touch. We look forward to working with you!
Reference