Databases used by the PRC
UniProt is a central repository of protein data created by combining the Swiss-Prot, TrEMBL and PIR-PSD databases. For many UniProt data bases we use the complete proteome version which consists of the set of proteins thought to be expressed by an organism whose genome has been completely sequenced. Annotation avaiable in UniProt, e.g. PTM annotation cab be retrieved by the search tools the PRC employes.
Swiss-Prot is an well annotated protein sequence database. It was established in 1986 and is today maintained collaboratively by the Swiss Institute of Bioinformatics (SIB) and The European Bioinformatics Institute (EBI). An advantage of using this database is that our analysis platform is able to utilize the UniProt PTM annotation. Source: http://web.expasy.org/docs/userman.html
TrEMBL (Translated European Molecular Biology Laboratory Nucleotide Sequence Database) is the computer-annotated section of the UniProt Knowledgebase. It contains translations of all coding regions in the DDBJ/EMBL/GenBank nucleotide databases, as well as protein sequences extracted from the literature or submitted to UniProtKB which are not yet integrated into Swiss-Prot. TrEMBL allows these sequences to be made publicly available quickly without diluting the high quality annotation found in Swiss-Prot.
The information in a TrEMBL entry is initially derived directly from the underlying DDBJ/EMBL/GenBank nucleotide entry and the quality of data is directly dependent on the information provided by the submitter of the nucleotide entry. This information may be enhanced afterward by automatic annotation procedures (see below) but if not, it remains as provided by the submitter until the entry is manually annotated and added to Swiss-Prot.
Contaminations database is based on known contaminats that are often found when identifying proteins from cell culture. It includes as an example, keratins and bovine serum albumin. This database is a combination of the MaxQuant contaminant database and a list of proteins identified in a exhaustive analysis of fetal bovine serum (http://www.ncbi.nlm.nih.gov/pubmed/20641139).
NCBInr (National Center for Biotechnology Information non-redundant) database is composed of the non-identical sequences from GenBank, Protein data bank (PDB), Swiss-Prot, Protein Information Source (PIR), and Protein Research Foundation (PRF).
IPI (International Protein Index) is a sequence database that has been used frequently, but is no longer supported/updated. However, IPI is still a reliable database. The last versions of IPI human, rat and mouse databases (v.3.87) are available on our Mascot server and in Andromeda. An advantage of using IPI databases is that our analysis platform can extract the avaiable PTM annotation.