Terminologies and Key Concepts Related to Bioinformatics

Terminologies and Key Concepts Related to Bioinformatics

Online Inquiry

Definition of Bioinformatics

Bioinformatics is a branch of biology and computer science concerned with acquiring, storing, analyzing, and disseminating biological data, most commonly DNA and amino acid sequences. Bioinformatics is a branch of computer science that studies gene and protein functions, as well as evolutionary relationships and protein shape prediction.

Some Terms and Concepts Related to Bioinformatics

Alignment: The result of comparing two or more gene or protein sequences to see how similar their base or amino acid sequences are. Sequence alignments are used to compare two or more genes or gene products for similarity, homology, function, or other degrees of relatedness.

Annotation: A collection of comments, notations, references, and citations that together define all experimental and inferred details about a gene or protein, either in free format or using a controlled vocabulary. Annotations can be used to describe other biological systems as well. One of the essential usages of bioinformatics tools is batch, automated annotation of large biological sequences.

Assembly: Collection of overlapping sequences from one or more linked genes that have been grouped together according to sequence similarity or identity. Sequence assembly can be used to piece together "shotgun" sequencing fragments based on overlapping restriction enzyme digests, or to classify and index novel genes discovered during "single-pass" cDNA sequencing.

Cluster: In a multidimensional space, the clustering of related objects. Clustering is used to create new characteristics that are abstractions of the objects' current attributes. The distance metric in the space has a significant impact on the clustering quality. Clustering is done in bioinformatics on sequences, high-throughput expression, and other experimental information. Clusters of partial or complete gene sequences can be utilized to find the complete (contiguous) sequence and learn more about what it does. The researcher can distinguish trends of co-regulation in groups of genes by clustering expression data.

Data Cleaning: A method of processing experimental info, such as noise, experimental errors, and other artifacts, using automated or semi-automated algorithms in order to create and keep high-quality data for later analysis. In high-throughput sequencing, data cleaning is usually required because compression or other experimental artifacts restrict the amount of sequence data produced from each sequencing run or "read".

Deletion Mapping: Distinct deletions are formed in an area of DNA and utilized to plot the functionally critical regions of that DNA. For example, systematic deletions in the areas of concern can reveal the minimal region of DNA required for a test promoter.

Expression Profile: A variety of high-throughput methodologies, including sample sequencing, serial analysis, or microarray-based detection, are used to determine the level and duration of expression of one or more genes from a specific cell or tissue type.

Functional Genomics: The study of protein structure, function, pathways, and networks using genomic data. In model organisms like worms, fruit flies, yeast, and mice, the function can be calculated by "knocking out" or "knocking-in" expressed genes.

High-Throughput Screening: The process of screening a huge number of compounds against a potential drug target in cell-free or whole-cell assays. These tests are usually performed in 96-well plates using automated, robotic station-based innovations or in higher-density array (“chip”) platforms.

Profile: Sequence profiles are tables of position-specific scores and gap-penalties generated from various alignments of sequences with a known relationship. Each position in the profile has scores for all probable amino acids, as well as one penalty score for opening and one penalty score for proceeding a gap at the position.

Visualization: The process of portraying abstract scientific info as pictures that can assist in identifying the meaning of the data is known as visualization.


  1. Greene CS, Tan J, Ung M, et al. Big data bioinformatics. Journal of cellular physiology. 2014, 229(12).
  2. Stevens R, Goble CA, Bechhofer S. Ontology-based knowledge representation for bioinformatics. Briefings in bioinformatics. 2000, 1(4).
* For Research Use Only. Not for use in diagnostic procedures.
Online Inquiry