Genome Assembly: Introduction, Advantages and Methods

Genome Assembly: Introduction, Advantages and Methods

Online Inquiry

The technique of selection for resolving the genetic makeup of an undescribed genome for which no prior comparison or nucleotide sequence exists is de novo genome sequencing and assembly. Next-generation sequencing allows us to pattern the entire genome at high coverage thanks to its high throughput, effectiveness, and speed. The genomics sequence is then resolved using diverse and advanced assembly algorithms, revealing the gene structure and positioning.

A resolved genome could offer a number of benefits:
1. Contribute to the research of bacterial and fungal pathogenesis and virulence.
2. Annotating the genome to create a genetic map.
3. Identify the various pathways that run through an organism.
4. Insights into evolutionary mechanisms are provided.
5. Validate the results of the experiments.
6. Understanding structural changes and complex rearrangements such as insertions, inversions, translocations, and copy number variations is easier with this tool.
7. Alternative splicing is being investigated, as well as gene expression levels.

Two Types of Genome Assembly

- Reference-based assembly is employed to evaluate sequencing efficiency (re-sequencing), classify and annotate novel aspects, and so on.
- De novo assembly is a technique for creating a reference, identifying novel features, and annotating previously unannotated features.

Genome Assembly Techniques

There are many genome sequencing methods that we are accessible to, these involve

  • Short read next-generation sequencing: Illumina
  •  Long read next-generation sequencing: Pacific Biosciences and Oxford Nanopore

Each of these genome sequencing methods has advantages and disadvantages in terms of genome assembly. Short reads are high-quality, low-cost, and give deep sequencing coverage; even so, they have a tendency to have coverage subjectivity in areas with a lot of AT or GC. Repeats and low-complexity areas make up the majority of such high AT/GC content areas. Short read lengths and biased coverage in repeat and fewer complexity areas lead to fragmented genome assemblies, which only give a partial but critical picture of an organism's genetic makeup. De-Bruijn graph-based assembly is used by the majority of short-read assemblers. Long reads have an average length of >10kb but are of lower quality due to random errors. Long reads sequencing necessitates high molecular weight starting DNA, which can necessitate specimen extraction expertise. Long-read assemblies, on average, have better continuity, larger N50 values, and greater genomic coverage than short-read assemblies. However, to fix random base call errors, these long-read assemblies must be polished with short reads. Long read assembly utilizes OLC (Overlap Layout Consensus) method to assemble the genome. General Steps in Genome Assembly
1. Evaluating the quality of Illumina Short-Read
2. Pre-processing of Raw Data
3. Short Read Genome Assembly
4. Long Read Genome Assembly
5. Assembly Polishing
6. Scaffolding and Gap Filling
7. Assessing whether the assembly is ready for annotation

About CD Genomics Bioinformatics Analysis

The bioinformatics analysis department of CD Genomics provides novel solutions for data-driven innovation aimed at discovering the hidden potential in biological data, tapping new insights related to life science research, and predicting new prospects.


  1. Jung H, Ventura T, Chung JS, et al. Twelve quick steps for genome assembly and annotation in the classroom. PLoS Computational Biology. 2020, 16(11).
  2. Li SF, Wang J, Dong R, et al. Chromosome-level genome assembly, annotation and evolutionary analysis of the ornamental plant Asparagus setaceus. Horticulture research. 2020, 7(1).
  3. Del Angel VD, Hjerde E, Sterck L, et al. Ten steps to get started in Genome Assembly and Annotation. F1000Research. 2018, 7.
* For Research Use Only. Not for use in diagnostic procedures.
Online Inquiry