Proteins are generally composed of one or more functional regions, commonly termed domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their function. Through biologists' continuous research on the structure and function of proteins, a series of experimental data have been generated. Therefore, protein databases of different types and functions have emerged, such as the database including protein structure and function, for visualization of the three-dimensional structure of biological macromolecules, for biological macromolecules' atomic coordinates, references, and structural information, a database for crystal structure factors and NMR experimental data, protein-protein and gene interactions database, and a database that provides a series of protein research tools and links. Here, CD Genomics lists some common protein databases to provide convenience for your protein research.
− Uniprot combines information extracted from the literature with computational analysis evaluated by biometrics. It is a manually annotated non-redundant protein sequence database.
− The Database of Interacting Proteins (DIP) database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. The data stored within the DIP database were curated, both, manually by expert curators and also automatically using computational approaches that utilize the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data.
− InterPro provides a convenient platform for functional annotation of protein sequences by integrating multiple protein-related databases, including the prediction of protein families, structural domains, and functional sites. It combines protein signatures from these member databases into a single searchable resource, capitalizing on their individual strengths to produce a powerful integrated database and diagnostic tool.
− MobiDB is a database containing information on protein disorders and mobility annotations.
− neXtProt is a new human protein-centric knowledge platform. Developed at the Swiss Institute of Bioinformatics (SIB), it aims to help researchers answer questions relevant to human proteins.
− The Pfam database is a large collection of protein families that contains 18,259 manually curated entries. Each protein family represented by multiple sequence alignments and hidden Markov models (HMMs).
− PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterize a protein family.
− PROSITE is a database of protein families, domains and functional sites. It is based on the observation that, while there is a huge number of different proteins, most of them can be grouped, based on the similarities in their sequences, into a limited number of families. Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor.
− The Protein Information Resource (PIR) is a comprehensive, annotated, non-redundant protein sequence database. It can help researchers identify and interpret protein sequence information, study molecular evolution, functional genome, and perform bioinformatics analysis.
− SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes. The SUPERFAMILY annotation is based on a collection of hidden Markov models, which represent structural protein domains at the SCOP superfamily level. A superfamily groups together domains which have an evolutionary relationship. The annotation is produced by scanning protein sequences from over 2,478 completely sequenced genomes against the hidden Markov models.
− The Protein Data Bank (PDB) is an international repository for results of macromolecular structural studies. Two classes of information are included in the Data Bank: atomic coordinates and structure factor data. Almost all information included is unpublished and is not available elsewhere. 167,518 biological macromolecular structures contained in the database enables breakthroughs in research and education.
− ModBase is a database of comparative protein structure models.
− Protein Data Bank in Europe - Knowledge Base is a community-driven resource managed by the PDBe team, collating functional annotations and predictions for structure data in the PDB archive. PDBe-KB contains data contributed by projects such as SIFTS and FunPDBe and aims to place structures from the PDB in their biological context.
− SWISS-MODEL is a fully automated protein structure homology-modelling server, accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer). The purpose of this server is to make protein modelling accessible to all life science researchers worldwide. Every week, SWISS-MODEL models all the sequences for thirteen core species based on the latest UniProtKB proteome.
− Biological General Repository for Interaction Datasets is an interaction repository with data compiled through comprehensive curation efforts. Current index is version 3.5.188 and searches 72,826 publications for 1,881,423 protein and genetic interactions, 28,093 chemical associations and 874,796 post translational modifications from major model organism species. All data are provided for free via search index and available for download in standardized formats.