Inquiry
Lone-Read Whole Genome De Novo Sequencing Data Analysis

Lone-Read Whole Genome De Novo Sequencing Data Analysis

Online Inquiry

As a bioinformatics data analysis provider, CD Genomics is experienced in Lone-Read Whole Genome De Novo Sequencing data analysis and our high-quality data analysis platform will be used to generate high-quality analysis results in a fast analysis cycle.

Introduction

De novo sequencing is no reference sequence available for alignment. Sequence reads are assembled as contigs, and the coverage quality of de novo sequence data depends on the size and continuity of the contigs (Amarasinghe, Ritchie, & Gouil, 2021). Long read sequencing is distinct from NGS platforms that generate relatively short reads (up to ~600 nt), which is characterized by the production of long reads, with an average length of more than 10 kb. Lone-Read drastically improved the quality of genome assembly and the analysis of genome structures. Genomes can be classified into simple genomes and complex genomes based on the complexity of the genome.

Simple Genome Complex Genome Ultra-complex genomes
Definition Genome Size<2Gb
Heterozygosity<0.5%
Repetitive Sequence Ratio<50%
Genome Size>2Gb
Heterozygosity>0.5%
50%<Repetitive Sequence Ratio<65%
Genome Size>10Gb
Repetitive Sequence Ratio>65%
Library type PacBio CLR Library
Nanopore Ligation 1D Library

Application Field

  • Species Research
  • Identifying Viruses
  • Assembling a polyploid genome

Bioinformatics Analysis Content

  • Data Processing
  • Genome Assembly
  • Genome Annotation
    • Gene Function Annotation
    • Repeat annotation
    • Structure annotation
    • Non-coding RNA annotation
  • Comparative Genomics Analysis
  • Personalized Analysis

How It Works

CD Genomics is a high-tech company specializing in multiomic data analysis. We provide services such as project design, data analysis, and database construction. With a focus on developing breakthrough products and services, we are a pioneer in the biotechnology industry, serving researchers and partners worldwide.

How It Works

Table 1 Partial software and database list

Software or database Versions Uses Link
NanoFilt 2.8.0 TGS data filtering https://github.com/wdecoster/nanofilt
minimap2 2.17 Mapping https://github.com/lh3/minimap2
samtools 1.11 Sorting https://github.com/samtools/samtools
StringTie 2.1.4 Reconstruct transcripts http://ccb.jhu.edu/software/stringtie/

1. How to ensure the reliability of the assembly results? What are the main methods for assessing assembly completeness and accuracy?

For the results of assembly, in addition to ensuring the two indicators of Contig N50 and Scaffold N50, the quality of assembly needs to be evaluated. Such as BUSCO, LAI, Merqury, CEGMA, EST sequence, RNA sequence, consistency, BAC clone sequence evaluation; BUSCO uses a single-copy orthologous gene bank to evaluate the genome, LAI uses long terminal repeats to evaluate the genome integrity, and Merqury evaluates the QV of the genome. These three evaluation methods are commonly used at present.

2. What is the recommended strategy for De novo assembly?

Typically, De novo projects must have 50-100X TGS data and 100X illumina NGS short-read data for assembly and error correction. At the same time, 100X Bionano and 100X Hi-C data can be added to assist assembly to increase assembly continuity and improve assembly accuracy. At present, the recommended assembly strategy is as follows: (1) 30× Ultra-long reads library + 50-100× ONT library + 100× Bionano DLS + 100× Hi-C + 100X illumina. (2) 30-60× PacBio ccs library + 100× Bionano DLS + 100X Hi-C + 50× illumina (for survey and evaluation).

3. Do we need the same DNA for Survey and genome de novo?

In principle, both Survey and de novo DNA need to be used from the same individual. If the amount of DNA is not sufficient for the entire de novo project, it is recommended that the DNA of small fragment libraries must be from the same individual, and three generations of large fragment or even very long fragment DNA libraries use another individual from the same population.

Reference

  1. Amarasinghe, S. L., Ritchie, M. E., & Gouil, Q. (2021). long-read-tools.org: an interactive catalogue of analysis methods for long-read sequencing data. Gigascience, 10(2). doi:10.1093/gigascience/giab003
* For Research Use Only. Not for use in diagnostic procedures.
Online Inquiry