Version 4 of the OPM Data Management Tools: Enhanced Support for Molecular Biology Databases*

Victor M. Markowitz, I-Min A. Chen, Ernest Szeto, and Jia-Lin N. Chen

Information and Computing Sciences Division, Lawrence Berkeley National Laboratory. Berkeley, CA 94720

Commercial relational database management systems (DBMSs) provide data management facilities that are essential for operating large production molecular biology databases (MDBs), such as Genome Data Base (GDB), Genome Sequence Data Base (GSDB), and Protein Data Bank (PDB). However, developing and maintaining large MBDs with relational DBMSs are complex, error-prone, and time-consuming processes. Furthermore, the large, low-level, DBMS-specific relational definitions of such MBDs is almost incomprehensible to scientists.

The Object-Protocol Model (OPM) provides scientists with high-level, concise, DBMS independent languages for specifying the structure of and manipulating data in MBDs. We have developed a suite of data management tools based on OPM, including a graphical OPM schema editor, an OPM to DBMS schema translator, an OPM based data entry and query tool, an OPM retrofitting tool, and an OPM data loading utility. For MBDs developed with relational DBMSs, the OPM data management tools substantially improve the efficiency of developing, maintaining, and interacting with (e.g., querying and browsing) MBDs [1].

The OPM data management tools are currently used for developing several new MBDs, such as the new versions of GDB and PDB, and for providing object-oriented interfaces on top of existing MBDs, such as GSDB. Interactions with the GDB, PDB, and GSDB groups over the past year revealed the need for extending the OPM tools with new features. Accordingly, version 4 of the OPM data management tools provide facilities for: (1) controlled vocabularies consisting of terms that are codified and associated with detailed descriptions; (2) object versions representing alternative experimental and analysis data and recording historic information; (3) cross-referencing MBDs in order to facilitate molecular biology data exploration across multiple databases; (4) different strategies for querying efficiently and updating MBDs; (5) interactively retrofitting OPM schemas on top of existing MBDs; (6) constructing customized interfaces for browsing, querying, and updating MBDs, via an API; and (7) physical database design (e.g., indexing, segment allocation) for improving the efficiency of data manipulation in large MBDs.

Current work on the OPM data management tools includes developing new OPM tools that will provide facilities for ( 1 ) restructuring MBDs as a result of structural changes entailed by the evolution of their underlying applications, and (2) developing MBDs with object oriented DBMSs.

The OPM tools, documentation, and papers are available via World Wide Web using URL: http://gizmo.lbl.gov/opm.html.

*Supported by a grant from the Director, Office of Energy Research, Office of Health and Environmental Research, of the U.S. Department of Energy under Contract DE-AC03-76SF00098.

[1] Chen, I.A., and Markowitz, V.M., An Overview of the Object-Protocol Model (OPM) and OPM Data Management Tools, Information Systems, Vol. 20, No 5 (July 1995), pp. 393-418.


Abstracts scanned from text submitted for January 1996 DOE Human Genome Program Contractor-Grantee Workshop.

Return to Table of Contents