Whole Exome Data Analysis: Assay Design and Protocol

Whole Exome Data Analysis: Assay Design and Protocol

Online Inquiry

Introduction to Whole Exome Sequencing

Whole exome sequencing (WES) is a method for finding rare or common variations linked with an abnormality or phenotype by sequencing the coding areas (exons) of a genome, typically human. Many more people can be analyzed at a lower cost and time in comparison to sequencing their whole genomes by concentrating sequence production on exons, which make up about 2.5 percent of the human genome. The most prevalent techniques use oligonucleotide probes to 'obtain' targeted DNA fragments, resulting in exonic sequence enrichment. Exons with well-defined annotated coding and non-coding sequences are included in the targeted exonic sequences. Areas that are more than 100 bases away from the targeted areas are not sequenced. Thus, variants within introns, promoters, or inter-genic regions are generally not detected.

Whole Exome Sequencing Data Analysis

To examine the sequenced exome, the specialists must run their data through a sequence of analytical techniques to see if the patient's genome contains a specific disease-causing mutation. These techniques are based on bioinformatics. The management of data that defines the genome, for example, is a major target of bioinformatics, deduced from a wide set of DNA sequencing experiments. Preparing the genomic library, enhancing the target areas, and sequencing the DNA are all steps in NGS procedures that result in the bioinformatics grand finale, where data contributes to ideas and research questions are answered.

Cleaning the data and integrating the sequences to a reference genome are the first steps in the bioinformatics framework. Since the ends of sequences are more susceptible to misreading than the interior areas, it is frequently essential to cut a number of nucleotide reads from them.

The sequences must be connected to a reference genome after the data has been cleaned. This genome does not comprise a single person, but rather a collection of genomes from a variety of donors, allowing variance in sequenced data to be identified against a standard genomic backdrop.

The sequencing data is qualified for the primary phase of data assessment now that it has been cleaned and referenced. Sequencing quality, capture efficiency, and variance detection are the three major assessment targets that a bioinformatician will probably concentrate on in their Exome sequencing framework.

Variance Detection

Variance identification is a crucial metric in many NGS workflows because it tells you how different your reference genome is from your sequenced genetic component. In the case of exome sequencing and our patient, scientists may be attempting to verify the existence of a specific single nucleotide polymorphism or sequence deletion incident that is associated with early-onset Alzheimer's disease. This type of mutation can be identified as a deviation from the gene's anticipated nucleotide sequence.

However, before searching for variants, the scientist must analyze sequencing quality metrics and obtain efficiency metrics to ensure that their evaluation of the variance detection analysis is accurate. The scientist must be confident that they will be able to obtain the variance of interest at the desired coverage depth.

Sequencing Quality

The FastQ file generated by a high-throughput sequencing experiment can be used to extract several useful sequence quality indicators. The FastQ file contains information about the sequencing data's quality, such as the likelihood of a base being read correctly by the sequencer. The number of sequenced fragments that map to a provided nucleotide is another significant quality metric. The read depth in hybridization capture that is considered industry-standard differs depending on the variety of experiments being conducted. The coverage standard for confidence in exome sequencing experiments is 20x – that is, 20 sequenced fragments integrate with a nucleotide of concern. This rate of read depth raises the chances that the variations found in a sequenced specimen are true positives rather than false positives. However, depending on the kind of experiment, the industry standard for read depth can differ greatly. The industry standard for searching for rare mutations can be as high as 2000x.

Uniformity, off-target capture events, and duplicate rate are all important metrics in an exome sequencing experiment. Uniformity enables a scientist to see how much of the captured exome has been sequenced incorrectly or incorrectly. Off-target capture events are sequences in the genome that were never intended to be captured and provide information to researchers about the capture library design's quality. The duplicate rate measures how frequently certain fragments are captured twice. It's feasible to under sequence or even skips diversity from other targets if huge percentages of certain targets are duplicates.

About CD Genomics Bioinformatics Analysis

The bioinformatics analysis department of CD Genomics provides novel solutions for data-driven innovation aimed at discovering the hidden potential in biological data, tapping new insights related to life science research, and predicting new prospects.


  1. Hegde M, Santani A, Mao R, et al. Development and validation of clinical whole-exome and whole-genome sequencing for detection of germline variants in inherited disease. Archives of Pathology and Laboratory Medicine. 2017, 141(6).
  2. Feliubadaló L, Tonda R, Gausachs M, et al. Benchmarking of whole exome sequencing and ad hoc designed panels for genetic testing of hereditary cancer. Scientific reports. 2017, 7(1).
  3. Bonnefond A, Philippe J, Durand E, et al. Whole-exome sequencing and high throughput genotyping identified KCNJ11 as the thirteenth MODY gene. PloS one. 2012, 7(6).
* For Research Use Only. Not for use in diagnostic procedures.
Online Inquiry