3 Methodological Initiatives for Metagenomics Bioinformatics Analysis

3 Methodological Initiatives for Metagenomics Bioinformatics Analysis

Online Inquiry

Introduction to Metagenomics

Metagenomics is the study of genetics using material acquired immediately from environmental specimens rather than cultivating it. Since the term was first used in 1998, metagenomics has been one of the most important research instruments in microbial ecology. By evaluating environmental DNA immediately without previous cultivation, metagenomic knowledge enables for a more in-depth comprehension of the ecological responsibilities, metabolism, and evolutionary history of microbes in a provided ecosystem.

Shotgun metagenomics is a technique for extracting the full range of sequence by randomly breaking down a long sequence of DNA fragments, using sequencing techniques (such as sanger or Next-Generation Sequencing) to sequence each fragment (contigs), and using algorithms to intersect the fragments. Information from shotgun metagenomics can be evaluated in a variety of ways. Each method is best appropriate to a specific set of questions. There are three types of methodological initiatives: read-based techniques, assembly-based techniques, and detection-based techniques.

Approaches to Analyze Metagenomics Data

Read-Based Approach

Unassembled reads are examined by read-based metagenomics. One of the first techniques to be employed was this one. It's still useful for quantitative analysis, particularly if you have access to relevant references.

Workflow for a Read-Based Strategy

  • Read QC
  • Read merging (Bbmerge).
  • For taxonomy info, mapping to NR (Blast, Diamond or Last). Methods that focus on the Kmer language are also employed (Kraken, Clarke).
  • Relationships to operational databases (Kegg, Pfam). Cross-mapping is another option (Mocat2).
  • It is necessary to assess summaries of taxonomic and functional allocations. The feature within taxa evaluations can be run if read IDs are maintained.

The Read-Based Approach Provides Answers to Common Questions

  • What is the overall taxonomic and functional makeup of these specimens?
  • What new types of a specific functional gene family can I discover?
  • In terms of taxonomic/functional structure, how do my sites or treatments vary?

Assembly-Based Approach

Assembly-based processes try to organize reads from one or more specimens, "bin" these specimens' contigs into genomes and then evaluate the genes and contigs.

Workflow for an Assembly-Based Method

  • Read QC
  • Read merging
  • Read the assembly instructions. (Megahit, Spades-meta) Individual specimens are usually constructed. Memory usage is a problem. Normalization and read-error correction can assist.
  • For quantification and binning, reads from each specimen are mapped back to the contigs from all specimens (bowtie2, bbmap).
  • Binding of the genome. Contigs are grouped into "genome bins" based on their composition and mapping data (Metabat, MaxBin, Concoct).
  • Gene annotation is accomplished from scratch (Prodigal, MetaGeneMArk). Gene annotation is carried out in the same way as read-centric methods, but the computational burden is reduced because the annotated data is 100 times smaller.
  • Comparative genomics, functional gene phylogenetics, and genome bin assessment.

Assembly-Based Approach Answers Typical Questions

  • What are particular microbes' functional and metabolic capabilities in my specimen?
  • What is the phylogeny of gene families in my samples?
  • Do the organisms that inhabit my samples differ?
  • Are there variants within taxa in my population?

Detection-Based Approach

Detection-based frameworks aim to detect the existence of organisms of concern, usually pathogens, with high precision but low sensitivity (recall).

Template for a Detection-Based Method

  • Read QC
  • Read merging (if paired)
  • Alignment or kmer-based matching against a curated database
  • Heuristics for evaluating the appropriate level of classification and the classification itself (Megan-LCA, Clarke, Surpi, Centrifuge, proprietary methods).

Assembly-Based Method Answers Typical Questions

  • Are there any known organisms of interest in my sample?
  • Are there any known functional genes in my sample, such as beta-lactamases?

About CD Genomics Bioinformatics Analysis

The bioinformatics analysis department of CD Genomics provides novel solutions for data-driven innovation aimed at discovering the hidden potential in biological data, tapping new insights related to life science research, and predicting new prospects.


  1. Zhang L, Loh KC, Lim JW, Zhang J. Bioinformatics analysis of metagenomics data of biogas-producing microbial communities in anaerobic digesters: A review. Renewable and Sustainable Energy Reviews. 2019, 100.
  2. Ju F, Zhang T. Experimental design and bioinformatics analysis for the application of metagenomics in environmental sciences and biotechnology. Environmental science & technology. 2015, 49(21).
  3. Kim M, Lee KH, Yoon SW, et al. Analytical tools and databases for metagenomics in the next-generation sequencing era. Genomics & informatics. 2013, 11(3).
* For Research Use Only. Not for use in diagnostic procedures.
Online Inquiry