With the development of first-generation sequencing, next-generation sequencing and various omics technologies, the capacity of data generation in the biomedical field has been greatly accelerated, along with the massive amount of data generated by the modern digital medical systems. The prevalence of internet allows the previously isolated data to be exchanged, compared, and updated instantly. A large number of biomedical databases hence came into being. Here, CD Genomics has compiled some of the commonly used biomedical databases.
− Online Mendelian Inheritance in Man (OMIM) is currently one of the most important bioinformatics databases in molecular genetics, with a focus on heritable or hereditary genetic diseases. It is constantly updated, containing text and related reference information, sequence records, maps and other related databases.
− The Catalogue of Somatic Mutations In Cancer (COSMIC) is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer. Information such as genes, cancer types, and mutations are available in the database. Users can have access to The Cancer Genome Atlas (TCGA) to view the corresponding information in a region of the human genome.
− The National Cancer Database (NCDB) is a nationally certified database and jointly established by the American College of Surgeons and the American Cancer Society. It is a clinical oncology database based on hospital registry data, derived from more than 1,500 cancers agency certified by the commission. The NCDB database can be used to analyze and track the treatment process and outcome of patients with malignant tumors. The database represents more than 70% of newly diagnosed cancer cases and more than 34 million historical records in the United States.
− The Cancer Genome Atlas (TCGA) was jointly developed by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It currently contains cancer data from 33 species. Each cancer involves a comprehensive, multi-dimensional map of key genomic changes. TCGA database describes the tumor tissues and matched normal tissues of more than 11,000 patients and has been widely used by researchers. These data have made more than 1,000 contributions to cancer research conducted by independent researchers or TCGA research network publications.
− Gene Expression Omnibus (GEO) is a public functional genomic data repository that supports Minimum Information About a Microarray Experiment (MIAME) compatible data submission. It can accept data based on arrays or sequences, and provides related tools to help users query and download experiments and manage gene expression profiles.
− The purpose of the Database of Genomic Variants (DGV) is to provide overview information of human chromosome structural variation. The database records a series of information related to gene variation and phenotype, and these information is continuously updated.
− The Database of Chromosomal Imbalance and Phenotype in Humans using Ensemble Resources (DECIPHER) is currently one of the most important bioinformatic databases in molecular genetics. Users can search the database to find a series of related genetic disease information, including mutation sites and clinical phenotype information, thereby improving clinical diagnosis efficiency. The DECIPHER database contains more than 10,000 case information uploaded by more than 200 research centers.
− The Comparative Toxicogenomics Database (CTD) is a powerful, publicly available database designed to improve general public's understanding of how environmental exposure affects human health. It provides relevant information on the interaction of chemical genes and proteins, chemical diseases and genetic diseases. These data are combined with function and path data to help verify the mechanism hypothesis about the environmental impact of disease.
− ClinVar is a public database that collects genetic variants associated with diseases. This database was constructed by the National Institutes of Health in 2013 for the development of biotechnology information. The ClinVar database integrates the genetic variation and clinical phenotype data information of multiple databases such as dbSNP, dbvar, Pubmed and OMIM. The four aspects of information, including mutation, clinical phenotype, empirical data, and functional annotation and analysis, are gradually formed through expert review. ClinVar has evolved to be a standard, credible and stable genetic variation-clinical phenotype related database. So far, clinical annotations containing more than 125,000 unique mutations have been obtained from researchers and other databases.