The advent of the big data era has had a significant impact on experimental science. Among them, an important development trend of scientific research in the biomedical field is data-driven. The purpose of experiments in the past was to obtain conclusions or to put forward a new hypothesis, but now through the study of massive data to explore different scientific laws, you can directly put forward hypotheses or draw reliable conclusions. Therefore, making full use of the existing information in the database can make your research faster and more accurately. Here, CD Genomics has compiled some commonly used nucleic acid databases to help with your research.
− The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM). It advances science and health by providing access to biomedical and genomic information.
− The European Molecular Biology Laboratory (EMBL) has become the core molecular biology basic research and education training institution in Europe. EMBL's research mainly focuses on the following aspects: biochemical experimental technology, mass spectrometry, cell biology, cell biophysics, differentiation, gene expression, structural biology, protein synthesis, structural biology, nucleic acid sequence data and radioactive hybridization database.
− The DNA Data Bank of Japan (DDBJ) was established in 1984. It is one of the three largest DNA databases in the world, forms an international DNA database with NCBI's GenBank and EMBL's EBI database. DDBJ's data information is updated every day.
− The Online Mendelian Inheritance in Man (OMIM) is currently one of the most important bioinformatics databases in molecular genetics. In addition, it is a database that manages human genes and the characteristics of human genetic diseases.
− NCBI Reference Sequence Database is a comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein. The reference sequence database collects the nucleic acid sequences (DNA, RNA) and protein products of major organisms from viruses, bacteria to eukaryotes.
− The International Genome Sample Resource (IGSR) was established to ensure the ongoing usability of data generated by the 1000 Genomes Project and to extend the data set.
− SNPedia is a wiki investigating human genetics. It shares information about the effects of variations in DNA, citing peer-reviewed scientific publications. There are currently 111009 SNPs in SNPedia. It is also used by Promethease to create a personal report linking your DNA variations to the information published about them.
− Archive of Functional Genomics Data stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community. So far, the database includes 73319 experiments, 2461457 assays and 57.81 TB of archived data.
− Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species.
− BioGPS is a free extensible and customizable gene annotation portal, a complete resource for learning about gene and protein function, and a powerful gene and protein expression annotation platform.
− The GenBank sequence database incorporates DNA sequences from all available public sources, primarily through the direct submission of sequence data from individual laboratories and from large-scale sequencing projects. Most submitters use the BankIt or Sequin programs to send their sequence data. Data exchange with the EMBL Data Library and the DDBJ helps ensure comprehensive worldwide coverage. GenBank data is accessible through Entrez, NCBI's integrated retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome and protein structure information.
− The Genome Sequence Archive (GSA) is a data repository for archiving raw sequence data. In compliance with data standards and structures of the International Nucleotide Sequence Database Collaboration (INSDC), GSA adopts four data objects (BioProject, BioSample, Experiment, and Run) for data organization, accepts raw sequence reads produced by a variety of sequencing platforms, stores both sequence reads and metadata submitted from all over the world, and makes all these data publicly available to worldwide scientific communities. In the era of big data, GSA is not only an important complement to existing INSDC members by alleviating the increasing burdens of handling sequence data deluge, but also takes the significant responsibility for global big data archive and provides free unrestricted access to all publicly available data in support of research activities throughout the world.
− The Single Nucleotide Polymorphism Database (dbSNP) was established by NCBI in cooperation with the National Human Genome Research Institute. It contains data on SNPs, short indel polymorphisms, microsatellite markers and short repeats, as well as its sources, detection and verification methods, genotype information, upstream and downstream sequence, population frequency and other information.
References