Somewhere in the Internet ether is a repository of information that spurs the advancement of science and medicine by helping biomedical researchers share their findings. An example of one such repository is a database called the Protein Data Bank (PDB), which is managed through an international collaboration called the Worldwide Protein Data Bank (wwPDB).
The Biologically Interesting Molecule Reference Dictionary (BIRD) of the wwPDB contains structures of molecules, such as peptide-like antibiotics and inhibitors, that are considered to be interesting molecules due to their ability to mimic peptides.
The Protein Data Bank (PDB) is a database of information about the three-dimensional structures of every single protein discovered and studied anywhere in the world. In addition to this, it also contains information and details about nucleic acid structures.
It contains over 86,000 proteins, including the structures and functions of each, plus the code, links for the references, and links to other studies that provide more information about each protein. Similarly, it has structural information about more than 2,500 nucleic acids and more than 43,000 nucleic acid-protein complexes.
This information represents the life’s work of thousands of scientists. And it’s a beautiful thing, because it’s the embodiment of true science, which involves collaboration in the pursuit of knowledge.
Members
Data is submitted to the database from its members, which are:
» Research Collaboratory for Structural Bioinformatics Protein Database (RCSB PDB), USA
» Biological Magnetic Resonance Data Bank, USA
» Protein Data Bank in Europe (PDBe)
» Protein Data Bank Japan (PDBj)
Structural information is generally specified in a special format called the PDB format, and the repository of these PDB files is termed the PDB archive. Data in the PDB format can be submitted through the sites of any of the four members. However, the RCSB oversees and manages the PDB archive and is termed the ‘archive keeper’, and all the members submit the data to RCSB. All the information and PDB files can be searched, accessed, and downloaded from any of the individual member sites.
Funding
The wwPDB is free to use, and is incredibly useful to researchers in all areas of science. As such, these researchers put their money where their mouth is, fund the bank through private donation, and several governments kick in some funds as well. In the US, money comes from the Department of Energy, the National Science Foundation, the National Library of Medicine, the National Cancer Institute, and others.
In Europe, the funding comes from the European Union itself, but also the National Institutes of Health, the European Molecular Biology Laboratory (EMBL) and more. In Japan, the Protein Data Bank is funded solely by the National Bioscience Database Center of the Japan Science and Technology Agency.
Need
Obviously, the database was created because the information it contains is extremely complex and the result of extensive research. Before the database, this information was scattered all over the world, in the libraries of whatever government or institution conducted the research. The problem is that one scientist’s result is another scientist’s tool, so there had to be a better way to share information.
Being so, the PDB was formed in 1971, and then transferred to RCSB in 1999. However, it was only in 2003 that the wwPDB was formed, and PDB became an international database. The wwPDB was formed to fix several issues, and achieve the following.
» Collecting all structural data into one central archive, which can be accessed at will by anybody around the world.
» Establishing a standard format to maintain and provide information so that it is easily understood and accessed. The most important benefit of this is that the data can be directly compared and analyzed to draw inferences.
» Ensuring that the information is validated and annotated to the same degree of scrutiny, and maintaining the quality and authenticity of data. Without that, anyone could submit anything, and important research could potentially be predicated on incorrect information.
Use
So, why is this information so important? The most important part of the database to many scientists is the actual 3-D structure of macromolecules. A molecule’s structure can tell you how it will behave, what it will bind to, what it does, and what can cause it to go wrong. In addition, it also gives insights into how the particular macromolecule, under study, may interact with other macromolecules around it. Apart from this information, specialized reference dictionaries, like Chemical Component Dictionary and Biologically Interesting Molecule Reference Dictionary (BIRD) provide specific information about molecular structures.
Scientists use this information to create new drugs, design diagnostics, and explain cellular behavior. In short, biomedical research (and in turn, medical advances) depend upon this information.
In science, you keep things to yourself until you publish. After that, it’s your duty to share what you know with the world. Not just to make a name for yourself, but because whatever you discovered may be the missing link in another researcher’s project, and that project may yield the next big thing. Scientific advancement doesn’t happen in a vacuum, and the Worldwide Protein Data Bank exemplifies that.