Sequence variation in the 16S ribosomal RNA (rRNA) gene is widely used to characterize taxonomic diversity presenting in microbial communities. A short-read sequencing platform for reading partial regions of the 16S rRNA gene is most commonly used by reducing the cost burden of next-generation sequencing (NGS), but misclassification at the species level due to its length being too short to consider sequence similarity remains a challenge. Full-length 16S amplicon sequencing analysis could overcome the microbial misidentification caused by different sequence similarity in each 16S variable region through comparison the identification accuracy.
Figure 1 Bacterial 16S rRNA gene sequence composition and primer selection.
- Medical field: the relationship between human microbiome and human health/disease, etc.
- Animal: rumen and animal health/nutrient digestion, etc.
- Agronomic field: microbial interactions with plants, etc.
CD Genomics Data Analysis Pipeline
Figure 2 Data Analysis Pipeline.
Bioinformatics Analysis Content
- Data processing and statistics
- Feature table construction
- Species annotation and taxonomic analysis
Taxonomy distribution histogram of all samples
Species abundance heatmap
- Alpha diversity analysis
Statistical data of alpha diversity
- Beta diversity analysis
Different algorithm distance matrix (jaccard, bray Curtis, weighted unifrac and unweighted unifrac)
※ Anosim/Adonis analysis
How It Works
Experienced teams of scientists, researchers, and technicians, we provide fast turnaround, high-quality data reports at competitive prices for worldwide customers. Customers can contact our employees directly and we will respond promptly. If you are interested in our services, please contact us or online inquiry for more detailed information.
Table 1 The software table
|FASTX Toolkit||0.0.14||Data process||https://anaconda.org/bioconda/fastx_toolkit|
Data statistics of the quality control
The number of sample sequences in each stage was statistically processed to evaluate the data quality. The data were evaluated mainly by counting the sequence number, sequence length and other parameters in each stage. The evaluation results of sequencing data of each sample are shown in the following table: Sample ID is sample name; Raw-CCS is the number of CCS identified for the sample. Clean CCS is the number of sequences after identifying and removing primers; Effective-CCS is the number of sequences used for subsequent analysis after length filtering and removal of chimeras. AvgLen (bp) is the average sequence length of samples. Effective (%) is the percentage of effective-CCS over Raw-CCS.
Table 2 Sample sequencing data processing results statistics
|Sample ID||Raw CCS||Clean CCS||Effective CCS||AvgLen(bp)||Effective (%)|
Species Annotation and Taxonomic Analysis
In the next sections we will begin to explore the taxonomic composition of the samples and compare samples to the metadata. The first step in this process is to assign taxonomy to the sequences in our QIIME 2 artifact using a pre-trained Naive Bayes classifier and the plugin. This classifier was trained on the Silva 138 99% OTUs. We will apply this classifier to sample sequences and generate a visualization of the resulting mapping from sequence to taxonomy.
Figure 3 The taxonomy distribution of all sample in Phylum classification level. Other classification levels can be found in the taxonomy folder.
The classification tree is a bifurcating tree that represents a hierarchical clustering of features. The hierarchical clustering uses ward hierarchical clustering based on the degree of proportionality between features.
Figure 4 Phylogenetic tree. The legend in the upper right corner is the species name at the phylum level, and the inner circle is the phylogenetic tree. The same phylum in the inner circle shows the same color. The outer circles indicate the relative abundance proportion of the species in different samples/groups.
Alpha Diversity Analysis
Microbial diversity can be assessed within a community (alpha diversity) or between the collections of samples (beta diversity). Four different metrics were calculated to assess the alpha diversity: Chao1 and Ace simply estimate the number of species in a community; Shannon and Simpson account for both richness and evenness of a community. Larger the Chao1, Ace and Shannon indices correspond to a smaller Simpson index value, indicating greater diversity of species.
Table 3 Alpha Diversity Analysis
Beta Diversity Analysis
Principal coordinates analysis (PCoA) is an ordination technique similar to PCA, which picks up the main elements and structure from reduced multi-dimensional database series of eigenvalues and eigenvectors. It starts with a similarity matrix or dissimilarity matrix (distance matrix) and assigns for each item a location in a low-dimensional space.
Figure 5 PCoA analysis based on weighted unifrac. Each point represents a sample, plotted by a principal component on the X- axis and another principal component on the Y- axis, which was colored by group. The percentage on each axis indicates the contribution value to discrepancy among samples.
1. What are the advantages of Full-length 16S amplicon sequencing?
The full 16S gene provides better taxonomic resolution; Circular consensus sequencing (CCS) combined with sophisticated denoising algorithms can to remove PCR and sequencing error.
- Johnson, J.S., et al., Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nature Communications, 2019. 10(1).