Home
Resources
Support Documents
Bioinformatics Analysis Pipeline for Target Area Sequencing

Bioinformatics Analysis Pipeline for Target Area Sequencing

Introduction to Target Sequencing

A targeted NGS method uses molecular biology methods to enhance particular genetic sequences by utilizing current genomic information. This allows scientists to concentrate their efforts on the genes or genomic areas that are most significant to their studies. It is now feasible to acquire sequence coverage of complicated genomic areas, such as degraded DNA or RNA from the clinical specimens or circulating DNA from blood, from complicated to sequence or constrained specimens. The two most frequent targeted sequencing techniques are exome sequencing and target-specific panels.

- Exome sequencing allows researchers to examine all the genome's exons or protein-coding counties. The human exome is made up of about 20,000 genes and makes up about 1.5 percent of the entire genome. In comparison to whole-genome sequencing, this allows teams to concentrate on content that is more appropriate to disease.

- The most adaptable alternative is to use target-specific NGS panels, which can be developed to sequence any gene or area of interest in a genome and involve structural and copy number variance, as well as RNA transcript analysis. Targeted panels, as opposed to broader methods like exome or whole-genome sequencing, produce smaller and more manageable data sets, making data analysis easier for scientists. Utilizing target-specific panels is the fastest and cost-effective NGS technique.

Hybridization capture or amplicon-based advancement can be used to create targeted NGS libraries enhanced for genomic regions.

Bioinformatics Analysis for Target Sequencing

- QC and cleanup of raw data
- Alignment with a point of reference
- Statistical assessment of data output, in-depth sequencing analysis, and coverage uniformity assessment
- Identification of SNP mutations
- SNP (Single-Nation Party) Annotation of RefGene
- Evaluation of the SNP dataset
- Pathogenicity assessment and single-sample SNP conservative prediction
- SNP allocation statistics on each component of a gene
- Identification of InDel mutation data
- RefGene annotation in the InDel database assessment
- InDel allocation statistics on each component of a gene

Raw Data QC and Alignment to a Reference

The raw sequence reads from a FASTQ or unaligned BAM (uBAM) file are integrated against the human reference genome in the bioinformatics pipeline sequencing tactic. Short sequences are stored as plain text in the FASTQ and uBAM file types, along with metadata such as base quality score and read identifiers.

The sequence alignment approach gives the short reads in the reference genome a genome positional scope and creates a few metadata fields in the Concise Idiosyncratic Gapped Alignment Report format, such as integration features (matches, mismatches, and gaps). A Sequence Alignment Mapping (SAM/BAM) or CRAM file format is used to contain the aligned sequences and metadata. The BAM file is used by downstream algorithms to recognize a variety of genetic changes, such as single nucleotide variants, insertions and deletions (indels), and tumor mutation burden.

Gene annotation

To keep all non-silent variations, such as those influencing splice locations and exonic indels, annotate variants against Ensembl or RefGene gene designs.

SNP mutation information detection

Locate overrepresented variants in the general community. The approximate intensity of variants is included in datasets such as dbSNP, 1000 Genome Project, NHLBI GO Exome Sequencing Project (ESP), The Genome Aggregation Database (gnomAD), and ExAC. Variants with a minor allele frequency of more than 1% are excluded because they are less likely to have oncogenic consequences. Processed variants are then transcribed against the COSMIC dataset (a cancer mutation catalog), enabling for the retention of those variants found in dbSNP but initially known as cancer mutations.

InDel mutation information detection

Conventional alignment-based variant calling techniques struggle to identify structural variants (SVs), such as small insertion and deletion variants (indels). Local realignment around indels and base quality score recalibration utilizing GATK is used to enhance the integration quality of filtered configurations. To decrease false-positive calls, the local realignment phase improves the alignment quality for bases around known and suspected indel roles.

About CD Genomics Bioinformatics Analysis

The bioinformatics analysis department of CD Genomics provides novel solutions for data-driven innovation aimed at discovering the hidden potential in biological data, tapping new insights related to life science research, and predicting new prospects.

References

Masser DR, Stanford DR, Freeman WM. Targeted DNA methylation analysis by next-generation sequencing. Journal of visualized experiments: JoVE. 2015, 96.
Can T. Introduction to bioinformatics. InmiRNomics: MicroRNA Biology and Computational Analysis 2014 (pp. 51-71). Humana Press, Totowa, NJ.
Bybee SM, Bracken-Grissom H, Haynes BD, et al. Targeted amplicon sequencing (TAS): a scalable next-gen approach to multilocus, multitaxa phylogenetics. Genome biology and evolution. 2011, 3.

* For Research Use Only. Not for use in diagnostic procedures.