E. E. Abola, J. Prilusky, N. O. Manning, J. L. Sussman
Protein Data Bank, Department of Chemistry, Brookhaven National Laboratory, Upton, NY 11973 and Bioinformatics Unit, Weizmann Institute of Science, 76100 Rehovot, Israel.
The Protein Data Bank (PDB) is an archive of experimentally-determined three-dimensional structures of proteins, nucleic acids, and other biological macromolecules. PDB has a 25-year history of service to a global community of researchers, educators, and students in a variety of scientific disciplines. The common interest shared by this community is the desire to access information that can relate the biological functions of macromolecules to their three-dimensional structures. We now report the construction of a new relational database, 3DBase, that provides access to knowledge and information on macromolecules using a high-level query language.
The complexity of PDB entries and their use by a multi-disciplinary community required the construction of a database that represents structural, biological, chemical, and bibliographic information. In addition to all coordinate entries found in PDB, the database contains semantic links to entries found in other databases. For example, 3DBase represents the relationships between sequences found in PDB with those in SWISSPROT, GSDB, or GenBank.
3DBase uses Victor Markowitz's Object Protocol Model (OPM) and the SYBASE DBMS engine. OPM's object-oriented view provides a scientifically intuitive representation of the data while SYBASE provides a powerful and robust environment for data management. Two primary objects in 3DBase are oExperiment and oMacroMolecule. These objects describe the experiment and the biologically active molecule, extending the current view found in PDB entries.
Database interoperability is addressed through the use of schema sharing and support for a variety of data interchange format in query results. 3DBase uses the CitDB schema developed by GDB to store literature references. In the near future the CitDB at PDB will be merged with GDB's data, thus making available a single CitDB database containing all the references of interest to the genomic community. In addition, 3DBase uses similar base class objects found in GDB. An example is GDB's powerful and elegant solution to the problem of providing user-supplied annotation to individual objects in the database.
Access to 3DBase is primarily through a Web browser constructed using the Genera software package developed by Stan Letovsky. In addition to accessing data stored in 3DBase, the browser provides links to entries in other databases. Graphical views of molecules are provided in the browser by use of R. Sayle's Rasmol program along with graphical annotation commands stored in the database. Access to 3DBase via SQL or OPM's QLT language will also be made available to those wishing to pose complex queries not available through the browser.
*The Protein Data Bank is supported by funds from the U. S. National Science Foundation, the U. S. Public Health Service, National Institutes of Health, National Center for Research Resources, National Institute of General Medical Sciences, National Library of Medicine, and the U. S. Department of Energy under contract DE-AC02-76CH00016.