Kenneth H. Fasman, Stanley I. Letovsky, Peter Li, Krishna Palaniappan, Michael A. Chipperfield, Christopher J. Porter, John M. Campbell, Edward W. Kraska, Sue E. Borchardt, and Deborah J. Schneider
Division of Biomedical Information Sciences, Johns Hopkins University School of Medicine, Baltimore MD 21205-2236
GDB 6.0 is a family of interrelated data sets operated by the Genome Data Base project. It consists of HGD (Human Genome Database), the mapping data component; CitDB, which holds literature citations; and the Genome Registry, which contains information on people and organizations in the genome community. We view the separation of this information into multiple databases as a pilot effort toward federating genomic databases across the Internet.
The object class hierarchy for the revised GDB starts with DBObject. This class contains basic attributes pertinent to all significant objects, such as owner, release date, and accession number. The remainder of the important objects in HGD are divided among five core classes:
GenomeObjects: Things making up or associated with genomes, such as GenomeRegions, GeneFamilies, and GeneProducts. It includes chromosomes, genes, phenotypic markers, cytogenetic landmarks, STSs, and contigs, among others. This is an enhancement of the concept of Locus from previous GDB releases.
MapObjects: Data that describe order and distance relations among regions of the genome, as inferred from mapping experiments. Other classes in this category represent higher-order relationships (i.e., alignments) between maps.
ExperimentObjects: Information about mapping experiments, experimental reagents, and experimental results from observed interactions between reagents.
VariationObjects: Data describing mutations, polymorphisms, population frequencies, etc.
AnnotationObjects: Objects that allow users of the database to comment on other objects in GDB. Literature citations, annotations, and cross-references to other databases may be associated with all user-submitted obiects in the database.
The new schema was developed using the Object Protocol Model [1]. Databases designed with OPM are easier to create and understand than an equivalent relational database. This is because the relationships between classes and their attributes, and between pairs of classes, are more explicitly defined. OPM allowed the GDB staff to develop the 6.0 design more quickly than would have been possible using the relational model. More importantly, the new database schema is easier for users of GDB to comprehend, and therefore easier to query.
Detailed documentation on the latest database schema can be obtained from GDB ' s WWW and anonymous FTP servers. We welcome feedback from the community on this and all aspects of the project's design and operations.
[1] Chen, I.A., and Markowitz, V.M. An overview of the Object-Protocol Model (OPM) and OPM data management tools. http://gizmo.lbl.gov/DM_TOOLS/OPM/doc/OPM_3/Overview.ps
*Supported by the U.S. Department of Energy (DE-FC02-9ER6130), the U.S. National Institutes of Health, and the Science and Technology Agency of Japan, with additional support from the Medical Research Council of the United Kingdom, the INSERM of France, and the European Union.