As a bioinformatics data analysis provider, CD Genomics is experienced in Lone-Read Whole Genome De Novo Sequencing data analysis and our high-quality data analysis platform will be used to generate high-quality analysis results in a fast analysis cycle.

Introduction

De novo sequencing is no reference sequence available for alignment. Sequence reads are assembled as contigs, and the coverage quality of de novo sequence data depends on the size and continuity of the contigs (Amarasinghe, Ritchie, & Gouil, 2021). Long read sequencing is distinct from NGS platforms that generate relatively short reads (up to ~600 nt), which is characterized by the production of long reads, with an average length of more than 10 kb. Lone-Read drastically improved the quality of genome assembly and the analysis of genome structures. Genomes can be classified into simple genomes and complex genomes based on the complexity of the genome.

	Simple Genome	Complex Genome	Ultra-complex genomes
Definition	Genome Size<2Gb Heterozygosity<0.5% Repetitive Sequence Ratio<50%	Genome Size>2Gb Heterozygosity>0.5% 50%<Repetitive Sequence Ratio<65%	Genome Size>10Gb Repetitive Sequence Ratio>65%
Library type	PacBio CLR Library Nanopore Ligation 1D Library

Application Field

Species Research
Identifying Viruses
Assembling a polyploid genome

Bioinformatics Analysis Content

Data Processing
Genome Assembly
Genome Annotation
- Gene Function Annotation
- Repeat annotation
- Structure annotation
- Non-coding RNA annotation
Comparative Genomics Analysis
Personalized Analysis

How It Works

CD Genomics is a high-tech company specializing in multiomic data analysis. We provide services such as project design, data analysis, and database construction. With a focus on developing breakthrough products and services, we are a pioneer in the biotechnology industry, serving researchers and partners worldwide.

How It Works

Table 1 Partial software and database list

Software or database	Versions	Uses	Link
NanoFilt	2.8.0	TGS data filtering	https://github.com/wdecoster/nanofilt
minimap2	2.17	Mapping	https://github.com/lh3/minimap2
samtools	1.11	Sorting	https://github.com/samtools/samtools
StringTie	2.1.4	Reconstruct transcripts	http://ccb.jhu.edu/software/stringtie/

1. How to ensure the reliability of the assembly results? What are the main methods for assessing assembly completeness and accuracy?

For the results of assembly, in addition to ensuring the two indicators of Contig N50 and Scaffold N50, the quality of assembly needs to be evaluated. Such as BUSCO, LAI, Merqury, CEGMA, EST sequence, RNA sequence, consistency, BAC clone sequence evaluation; BUSCO uses a single-copy orthologous gene bank to evaluate the genome, LAI uses long terminal repeats to evaluate the genome integrity, and Merqury evaluates the QV of the genome. These three evaluation methods are commonly used at present.

2. What is the recommended strategy for De novo assembly?

Typically, De novo projects must have 50-100X TGS data and 100X illumina NGS short-read data for assembly and error correction. At the same time, 100X Bionano and 100X Hi-C data can be added to assist assembly to increase assembly continuity and improve assembly accuracy. At present, the recommended assembly strategy is as follows: (1) 30× Ultra-long reads library + 50-100× ONT library + 100× Bionano DLS + 100× Hi-C + 100X illumina. (2) 30-60× PacBio ccs library + 100× Bionano DLS + 100X Hi-C + 50× illumina (for survey and evaluation).

3. Do we need the same DNA for Survey and genome de novo?

In principle, both Survey and de novo DNA need to be used from the same individual. If the amount of DNA is not sufficient for the entire de novo project, it is recommended that the DNA of small fragment libraries must be from the same individual, and three generations of large fragment or even very long fragment DNA libraries use another individual from the same population.

Lone-Read Whole Genome De Novo Sequencing Data Analysis

Introduction

Application Field

Bioinformatics Analysis Content

How It Works