Target Exome Sequencing: Introduction and Bioinformatics Analysis Pipeline

Target Exome Sequencing: Introduction and Bioinformatics Analysis Pipeline

Online Inquiry

Introduction to Targeted Exome Sequencing

Exome sequencing is a technique for sequencing the exonic areas of a genome, which are the transcribed aspects of the genome found in mature mRNAs, such as protein-coding sequences, as well as untranslated regions (UTRs).

There are approximately 180,000 exons in humans, with a total length of approximately 30 million base pairs (30 Mb). Thus, despite accounting for only 1% of the human genome, the exome is thought to contain up to 85% of all disease-causing variants.

Exome sequencing, as a substitute to whole-genome sequencing in the diagnosis of genetic disease, is less expensive while still containing far more possible disease-causing variation locations than genotyping arrays. This is especially important in the situation of rare genetic diseases, where the causative variations may happen in the human population at too low an intensity to be involved on genotyping arrays.

Overview of Bioinformatics Workflow

Cleaning the data and aligning the sequences to a reference genome are the first steps in the bioinformatics processes. Since the ends of sequences are more susceptible to misreading than the interior areas, it is frequently essential to cut an amount of nucleotide reads from them.

The sequences must be connected to a reference genome after the data has been cleaned. This genome does not portray a single person, but rather a collection of genomes from a variety of donors, allowing variation in sequenced data to be detected against a standard genomic backdrop.

The sequencing data is prepared for the primary step of data assessment now that it has been cleaned and referenced. In an Exome sequencing interface, a bioinformatician will probably concentrate on three primary assessment objectives: sequencing quality, capture efficiency, and variance identification.

Mapped Reads Postprocessing

The best post-processing processes are determined by the variant calling application used in the next step. Following are some examples of post-processing steps:

- Scan all specimens' paired-end reads to keep only those read pairs for which both the forward and reverse reads have been effectively mapped to the reference.

- Duplicate reads, which are typically caused by PCR-overamplification of genomic fragments during the production of sequencing libraries, can result in incorrect genotype assignments at variant locations to some point.

Variant Calling

Begin searching for proof of sequence discrepancies, i.e., variants, between the sequenced genomic specimens and the reference genome after all sequenced reads have been mapped and post-processed. Over the last decade, this operation has been constantly automated and configured, and the present variant calling application conceals much of the complexity included.

Variant Annotation and Reporting

A set of variants found in a list of specimens is a good start but finding biologically or clinically significant detail in it requires more tools and information. The list's variants must include:

- Preferred based on their significant importance to the biological/clinical phenotype under investigation
Even with exome sequencing, only a small percentage of the variants found will have an evident influence on a protein's mechanism (many variants will initiate silent alterations, or stay in intronic areas still covered by the exome-enriched sequencing data). Plenty of these will have been seen in healthy people before, asserting against them showing a significant role in a detrimental phenotype.

- Filtered based on the inheritance template anticipated for a causative variant
At each variant location, a multisampling VCF file documents the most likely genotypes of all specimens. Identifying which people (samples) are influenced by a phenotype allows us to rule out variants with inheritance trends that are incongruent with the phenotype's identified inheritance.

- Given in a more human-friendly type
While the VCF template can be employed to transmit all necessary details about any variant, humans have a difficult time parsing it.

About CD Genomics Bioinformatics Analysis

The bioinformatics analysis department of CD Genomics provides novel solutions for data-driven innovation aimed at discovering the hidden potential in biological data, tapping new insights related to life science research, and predicting new prospects.


  1. Warr A, Robert C, Hume D, et al. Exome sequencing: current and future perspectives. G3: Genes, Genomes, Genetics. 2015, 5(8).
  2. Lieber DS, Calvo SE, Shanahan K, et al. Targeted exome sequencing of suspected mitochondrial disorders. Neurology. 2013, 80(19).
  3. Wang Z, Liu X, Yang BZ, Gelernter J. The role and challenges of exome sequencing in studies of human diseases. Frontiers in genetics. 2013, 4.
* For Research Use Only. Not for use in diagnostic procedures.
Online Inquiry