What is Bacterial De Novo Sequencing Analysis?
Bacterial genome de novo sequencing is a method to obtain the whole genome sequence of bacteria through genome sequencing and assembly without reference genome, and carrying out structural and functional research on it. After sequencing these bacteria, a series of follow-up studies of the species will be conducted, so as to better understand and control the characteristics and physiological processes of the strain. Bacterial de novo sequencing has replaced the traditional methods for studying the genetic mechanism and key functional genes of bacterial evolution.
Bioinformatic analysis of bacterial genome de novo sequencing is based on assembly, to perform genome composition analysis, functional annotation, single gene or gene interaction, expression regulation and other functions. In addition, important genes and proteins can be predicted to understand bacterial function and possible related mechanisms.
Fig 1. Whole-genome phylogenetic tree of ExPEC E. coli isolates. (Salipante S J, et al.2014.)
Application of Bacterial De Novo Sequencing Analysis
Pathogenic bacteria pathogenicity and disease prevention and treatment research.
Identification of disease-related genes.
Vaccine development and research.
Development of new antibiotics.
Evolutionary analysis.
Advantages of CD Genomics
Thoroughly planned analysis process, fast delivery and short cycle time
Employ meticulous planning to ensure an efficient analysis workflow, providing rapid delivery and reducing overall cycle time without compromising on quality.
Rich and in-depth analysis content
Offer comprehensive and detailed analysis, delving deeply into the data to uncover meaningful insights and provide robust interpretations of miRNA sequencing results.
Personalized analysis and chart making
Tailor the analysis and visualization to meet specific research needs, creating customized charts and graphs that clearly illustrate key findings and trends.
Rich experience in bacterial genome analysis
Leverage extensive expertise in bacterial genome sequencing and analysis to apply proven methodologies, ensuring accurate and insightful results in miRNA research.
Multi-omics combined analysis
At CD Genomics, our bacterial de novo sequencing service integrates multi-omics analysis, combining microbial sequencing with metabolomics and proteomics to provide comprehensive insights into bacterial functions and interactions.
Bacterial De Novo Sequencing Process
Fig 2. Pipeline of Bacterial De Novo Sequencing
Sample Submission Guidelines of Sequencing
CD Genomics Data Analysis Pipeline
Bioinformatics Analysis Content
CD Genomics processes bacterial genomic data generated by next-generation sequencing or third-generation sequencing, and can handle data of different depths or sequencing strategies. We will adopt different bioinformatic analysis strategies according to different types of raw data.
Data preprocessing | Sequencing data statistics |
Remove adapter sequence and low-quality sequence | |
Genome assembly | Genome assembly. |
Evaluation of assembly results | |
Genome structure research | Coding/non-coding gene prediction |
Repeat sequence analysis | |
Gene island prediction | |
Prophage prediction | |
CRISPR prediction | |
Gene function annotation | Functional database comments (such as GO, COG, KEGG, SwissProt,) |
Genomic methylation modification analysis | |
Pathogenic research Correlation Analysis |
Secretory system effector protein prediction |
Pathogen-host interaction analysis | |
Secreted protein analysis | |
Gene cluster analysis of secondary metabolites | |
Annotation of virulence factors and drug resistance genes of bacterial pathogens |
If you have other requirements for bacterial genome de novo sequencing bioinformatic analysis, for example, collinearity analysis between genomes, mutation detection between genomes, Core-pan gene analysis, and evolutionary tree construction, we will provide the right biological information analysis strategy according to your needs. For analysis content, price, cycle, if you have any questions, please use online inquiry.
Example Data Analysis Report
To demonstrate the quality and detail of our CD Genomics report for bacterial de novo sequencing data analysis, we are pleased to offer a sample report upon request. Please contact us to receive your copy. For further insights, you can also refer to a client-published article, "Complete Genome Sequence of the Lignocellulose-Degrading Actinomycete Streptomyces albus CAS922." which features some of the data we provided.
How It Works
CD Genomics is a high-tech company specializing in multiomic data analysis. We provide services such as project design, data analysis, and database construction. With a focus on developing breakthrough products and services, we are a pioneer in the biotechnology industry, serving researchers and partners worldwide.
CD Genomics provides general and customized analyses of bacterial genome de novo sequencing data. With experienced teams of scientists, we provide fast turnaround, high-quality data reports at competitive prices for worldwide customers. Customers can contact our scientists directly to receive prompt responses. If you are interested in our services, please contact us for more detailed information.
Reference
- Salipante S J, et al. Large-scale genomic sequencing of extraintestinal pathogenic Escherichia coli strains. Genome Research, 2014, 25(1).
What Does Data Analysis of Bacterial De Novo Sequencing Show?
Quality Distribution
Fig 3. Distribution of base quality.
Distribution of Base Content
Fig 4. Distribution of base content. X-coordinate represents the position of the bases along the reads.
The SNPs Detection of a Single Sample
Fig 5. SNP mutation type distribution.
Small InDel Detection and Annotation
Fig 6. InDel length distribution in CDS regions.
InDels Annotation
Fig 7. Statistics pie of InDel annotations
Title: Comparison of De Novo Assembly Strategies for Bacterial Genomes
Published: International Journal of Molecular Sciences
Main Method: Bacterial De Novo Assembly, Illumina Sequencing, Pacific Biosciences (PacBio) SMRT Sequencing, Oxford Nanopore Technologies (ONT) Sequencing
Abstract: The accurate assembly of bacterial genomes is critical for understanding bacterial pathogenesis and developing effective treatment strategies. This study compared various de novo assembly strategies for the Haemophilus parasuis genome, the causative agent of Glässer's disease in swine. Utilizing Illumina sequencing for short reads and long-read sequencing platforms such as PacBio SMRT and ONT, the study evaluated the impact of different assembly methods on genomic accuracy, completeness, and protein prediction. Results indicate that the optimal assembly strategy involves long-read sequencing followed by short-read polishing, with ONT reads polished with Homopolish emerging as a promising method due to cost-effectiveness and high accuracy.
Main Research Results:
1. Data Sequencing:
Illumina Sequencing:
Paired-end reads of 2 × 150 bp with an average read quality of Q29.7, and 92.3% of the reads had a quality value greater than Q30.
PacBio SMRT Sequencing:
Reads with an average length of 9598 bp, longest read 80,413 bp, and an average quality value of Q15.
ONT Sequencing:
Reads with an average length of 5480 bp, longest read 125,711 bp, and average quality value of Q13.2, with 85.4% of reads above Q10.
2. De Novo Assembly:
Independent Assembly:
ONT Reads: Fully circular genome assembly achieved, highlighting the utility of long reads in resolving complex genomes.
PacBio Reads: Achieved fewer contigs compared to Illumina, but not a continuous genome.
Illumina Reads: Resulted in a large number of fragmented contigs due to the limitations of short-read lengths.
Figure 8. Comparison of results of independent assembly strategies.
Hybrid Assembly:
Methodology: Short-read Illumina data scaffolded with long-read data from ONT or PacBio.
Preferred Tools: Unicycler and MaSuRCA outperformed SPAdes in generating complete and accurate genomes.
Results: Combining Illumina with ONT reads proved superior, benefiting from ONT's ultralong reads.
Long-Read Assembly with Polishing:
Strategy: Initial assembly with long reads followed by polishing with short reads using tools like Pilon and Medaka.
Key Findings: Polished assemblies (ONT or PacBio) provided high accuracy, completeness, and better protein prediction.
Homopolish Effectiveness: Corrected systematic ONT errors without needing short-read Illumina data.
3. Quality and Completeness Assessment:
QUAST Analysis:
ONT reads polished with Illumina achieved high assembly quality with fewer misassemblies.
Hybrid assemblies using Unicycler and MaSuRCA showed improved continuous assemblies.
BUSCO Assessment:
Significantly higher completeness post-polishing, with Homopolish providing near-complete genomes.
Figure 9. Evaluation of completeness of assembly results of different strategies.
16S rDNA Sequence Alignment:
Figure 10. 16S rDNA sequence alignment results of assembled genomes.
The primary errors in ONT assemblies were indels in homopolymer regions, effectively corrected by Homopolish.
Figure 11. Evaluation of completeness of assembly results of 16 isolates.
Conclusion
This comparative study underscores the efficiency and effectiveness of long-read sequencing platforms (PacBio and ONT) for bacterial genome assembly. It recommends ONT for its cost-efficiency and ultralong read lengths, particularly when followed by error correction with Homopolish. The combined approach of long-read sequencing and short-read polishing provides the highest accuracy and genome completeness, essential for detailed bacterial genome studies.
1. What is the purpose of de novo sequencing?
De novo sequencing aims to determine the complete nucleotide sequence of a genome from scratch without a reference, enabling the discovery of new genes, and genetic variations, and the creation of comprehensive genetic maps, essential for studying unsequenced organisms and complex genetic traits.
2. What does a de novo gene do?
A de novo gene, newly emerged without clear ancestry, can introduce novel functions into an organism. These genes may play roles in unique biological processes, adaptation, and evolution, contributing to the organism's phenotypic diversity and potentially offering advantages in specific environments.
3. What does de novo mean in microbiology?
In microbiology, de novo sequencing refers to the process of determining the complete nucleotide sequence of a microorganism's genome from scratch, without using a reference genome. This technique is crucial for identifying novel genes, and genetic variations, and constructing comprehensive genetic maps, especially for previously unsequenced organisms. De novo sequencing enables researchers to explore microbial diversity, understand evolutionary relationships, and discover unique genetic traits that can inform studies on microbial functions, pathogenicity, and resistance mechanisms.
4. What is the principle of de novo sequencing?
De novo sequencing involves fragmenting DNA, sequencing the fragments, and using computational algorithms to assemble the sequences into a complete genome. This method reconstructs the genome without a reference, allowing the discovery of novel genes and genetic structures in unsequenced organisms.
5. What tools are used for Bacterial De Novo Sequencing Data Analysis?
SPAdes: Assembles bacterial genomes from short reads.
Velvet: Optimizes de novo assembly using short-read sequencing data.
ABySS: Efficient for large genome assemblies.
SOAPdenovo: Suitable for short-read assemblies.
QUAST: Evaluates and compares genome assemblies.
Pilon: Improves draft assemblies.
Prokka: Annotates bacterial genomes.
6. Why are long-read sequencing technologies preferred for de novo bacterial genome assembly?
Long-read technologies (like PacBio and ONT) generate reads spanning several kilobases, which can traverse repetitive genomic regions and structurally complex areas, producing more contiguous and complete genome assemblies compared to short-read technologies.
7. What is the role of polishing in genome assembly?
Polishing involves correcting errors in the initial genome assembly using high-quality reads or homologous sequences. For example, Illumina reads or tools like Homopolish are used to rectify indels and mismatches, thereby enhancing assembly accuracy and completeness.
8. What challenges are associated with Illumina-only assemblies?
Illumina sequencing, limited to short reads, struggles with repetitive regions and structural complexities, resulting in fragmented assemblies. This limitation necessitates complementing with long reads to achieve complete and accurate genomes.
9. What advancements in ONT sequencing enhance its utility for microbial genome research?
Recent advancements in ONT include longer read lengths, improved sequencing accuracy, and more efficient base-calling algorithms. Continuous updates to sequencing chemistries and software are bridging the accuracy gap compared to other platforms, making ONT increasingly valuable for comprehensive microbial genomics.