Inquiry
Full-length 16S Amplicon Sequencing Analysis

Full-length 16S Amplicon Sequencing Analysis

Online Inquiry

Introduction

Sequence variation in the 16S ribosomal RNA (rRNA) gene is widely used to characterize taxonomic diversity presenting in microbial communities. A short-read sequencing platform for reading partial regions of the 16S rRNA gene is most commonly used by reducing the cost burden of next-generation sequencing (NGS), but misclassification at the species level due to its length being too short to consider sequence similarity remains a challenge. Full-length 16S amplicon sequencing analysis could overcome the microbial misidentification caused by different sequence similarity in each 16S variable region through comparison the identification accuracy[1].

Bacterial 16S rRNA gene sequence composition and primer selection.Figure 1 Bacterial 16S rRNA gene sequence composition and primer selection.

Application Field

  • Medical field: the relationship between human microbiome and human health/disease, etc.
  • Animal: rumen and animal health/nutrient digestion, etc.
  • Agronomic field: microbial interactions with plants, etc.

CD Genomics Data Analysis Pipeline

Data Analysis Pipeline.Figure 2 Data Analysis Pipeline.

Bioinformatics Analysis Content

  • Data processing and statistics
  • Feature table construction
  • Species annotation and taxonomic analysis
    Taxonomy distribution histogram of all samples
    Species abundance heatmap
    Classification tree
    Metastats
  • Alpha diversity analysis
    Statistical data of alpha diversity
    Rarefaction curve
    Chao1 curve
    Shannon curve
    Rank abundance
  • Beta diversity analysis
    Different algorithm distance matrix (jaccard, bray Curtis, weighted unifrac and unweighted unifrac)
    PCA analysis
    PCoA analysis
    LEfSe
    ※ Anosim/Adonis analysis
    UPGMA analysis
    PICRUST2

How It Works

How It Works

Experienced teams of scientists, researchers, and technicians, we provide fast turnaround, high-quality data reports at competitive prices for worldwide customers. Customers can contact our employees directly and we will respond promptly. If you are interested in our services, please contact us or online inquiry for more detailed information.

Table 1 The software table

Software Version Analysis_module Website
Trimmomatic 0.33 Data process https://github.com/usadellab/Trimmomatic
Cutadapt 1.9.1 Data process http://cutadapt.readthedocs.org/
USEARCH 10.0.240_i86 Data process http://drive5.com/usearch
FASTX Toolkit 0.0.14 Data process https://anaconda.org/bioconda/fastx_toolkit
Flash 1.2.11 Data process http://ccb.jhu.edu/software/FLASH/index.shtml
Fastp 0.23.1 Data process https://github.com/OpenGene/fastp
QIIME2 2020.6.0 -- https://qiime2.org
DADA2 1.20.0 Denoise https://benjjneb.github.io/dada2/
KronaTools 2.6 Krona analysis https://github.com/marbl/Krona/releases
Mothur 1.34.4 Alpha diversity http://www.mothur.org
Minimap2 2.14 Taxonomy anotation https://github.com/lh3/minimap2
LEfSe 1.1.1 Difference analysis https://github.com/SegataLab/lefse/tree/master/lefse
VSEARCH 2.8.1 Sequence clustering https://github.com/torognes/vsearch
Picrust2 2.3.0 Function prediction https://huttenhower.sph.harvard.edu/picrust
Bugbase 0.1.0 Function prediction https://github.com/knights-lab/BugBase
FAPROTAX 1.2.6 Function prediction http://www.loucalab.com/archive/FAPROTAX/lib/php/
Tax4Fun 0.3.1 Function prediction http://tax4fun.gobics.de/
Funguild 1.0 Function prediction http://www.stbates.org/funguild_db.php

Data statistics of the quality control

The number of sample sequences in each stage was statistically processed to evaluate the data quality. The data were evaluated mainly by counting the sequence number, sequence length and other parameters in each stage. The evaluation results of sequencing data of each sample are shown in the following table: Sample ID is sample name; Raw-CCS is the number of CCS identified for the sample. Clean CCS is the number of sequences after identifying and removing primers; Effective-CCS is the number of sequences used for subsequent analysis after length filtering and removal of chimeras. AvgLen (bp) is the average sequence length of samples. Effective (%) is the percentage of effective-CCS over Raw-CCS.

Table 2 Sample sequencing data processing results statistics

Sample ID Raw CCS Clean CCS Effective CCS AvgLen(bp) Effective (%)
KSFBL 14,692 13,755 13,224 1,455 90.01
KSVBL 12,836 12,791 12,760 1,460 99.41

Species Annotation and Taxonomic Analysis

In the next sections we will begin to explore the taxonomic composition of the samples and compare samples to the metadata. The first step in this process is to assign taxonomy to the sequences in our QIIME 2 artifact using a pre-trained Naive Bayes classifier and the plugin. This classifier was trained on the Silva 138 99% OTUs. We will apply this classifier to sample sequences and generate a visualization of the resulting mapping from sequence to taxonomy.

The taxonomy distribution of all sample in Phylum classification level. Other classification levels can be found in the taxonomy folder.Figure 3 The taxonomy distribution of all sample in Phylum classification level. Other classification levels can be found in the taxonomy folder.

Classification Tree

The classification tree is a bifurcating tree that represents a hierarchical clustering of features. The hierarchical clustering uses ward hierarchical clustering based on the degree of proportionality between features.

Phylogenetic tree. The legend in the upper right corner is the species name at the phylum level, and the inner circle is the phylogenetic tree. The same phylum in the inner circle shows the same color. The outer circles indicate the relative abundance proportion of the species in different samples/groups.Figure 4 Phylogenetic tree. The legend in the upper right corner is the species name at the phylum level, and the inner circle is the phylogenetic tree. The same phylum in the inner circle shows the same color. The outer circles indicate the relative abundance proportion of the species in different samples/groups.

Alpha Diversity Analysis

Microbial diversity can be assessed within a community (alpha diversity) or between the collections of samples (beta diversity). Four different metrics were calculated to assess the alpha diversity: Chao1 and Ace simply estimate the number of species in a community; Shannon and Simpson account for both richness and evenness of a community. Larger the Chao1, Ace and Shannon indices correspond to a smaller Simpson index value, indicating greater diversity of species.

Table 3 Alpha Diversity Analysis

Sample ID ACE Chao1 Simpson Shannon
KSFBL 52.238 52.0 0.5304 2.2672
KSVBL 91.4318 101.5 0.5584 2.3816

Beta Diversity Analysis

Principal coordinates analysis (PCoA) is an ordination technique similar to PCA, which picks up the main elements and structure from reduced multi-dimensional database series of eigenvalues and eigenvectors. It starts with a similarity matrix or dissimilarity matrix (distance matrix) and assigns for each item a location in a low-dimensional space.

PCoA analysis based on weighted unifrac. Each point represents a sample, plotted by a principal component on the X- axis and another principal component on the Y- axis, which was colored by group. The percentage on each axis indicates the contribution value to discrepancy among samples.Figure 5 PCoA analysis based on weighted unifrac. Each point represents a sample, plotted by a principal component on the X- axis and another principal component on the Y- axis, which was colored by group. The percentage on each axis indicates the contribution value to discrepancy among samples.

1. What are the advantages of Full-length 16S amplicon sequencing?

The full 16S gene provides better taxonomic resolution; Circular consensus sequencing (CCS) combined with sophisticated denoising algorithms can to remove PCR and sequencing error.

Reference

  1. Johnson, J.S., et al., Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nature Communications, 2019. 10(1).
* For Research Use Only. Not for use in diagnostic procedures.
Online Inquiry