Improved Map Representation for the Genome Database V6*

Stanley I. Letovsky, Kenneth H. Fasman, and Peter Li

Division of Biomedical Information Sciences, Johns Hopkins University School of Medicine, Baltimore MD 21205-2236

Version 6.0 of the Human Genome Data Base implements a greatly improved representation of genomic maps over previous releases of the database. The goals of the new design were to improve the expressiveness of GDB's map model, to represent genomic regions at multiple resolutions, to produce map renderings easily, to provide explicit representations of the experimental data and inferred spatial relationships underlying maps, to provide more sophisticated querying on order and distance relationships, to allow for ongoing community contribution to maps, to represent alignments of maps, and to provide support for larger, denser maps of the genome.

The core of the new map representation is the GDB Map object, which represents all genomic maps as sets of GenomeRegions, each having both a coordinate position and a specified pair of flanking markers. GenomeRegions may be points or intervals as appropriate for the map's level of resolution. The flanking markers provide order information that can be used to determine the precision of the coordinate assignment. Order-only maps can be accommodated by using arbitrary, ordinal coordinates. A typical map is a combination of fully ordered "framework" regions and other markers placed within specified framework intervals.

MapRelations represent the component order and distance relationships from which maps are constructed. These relationships among genome regions come in two types: TwoPointDistances and ThreePointOrders. Map relationships are either transitively derived from other MapRelations or are inferred directly from ReagentRelations, observed experimental relationships between MappingReagents.

The GDB 6.0 representation of maps is designed for efficient database searching. It supports queries on position, order, and distance. One can find all maps or genome regions that overlap, contain, or are contained in a specified genome region. The region can be specified as a single marker, a marker plus or minus a distance, or as a range defined by a pair of markers. One can also find all maps consistent with a specified marker order or inter-marker distance range.

The new map representation is intended to capture better the whole-chromosome genetic and physical maps which are the Human Genome Project's current focus. At the same time, this model sets the stage for a better integration of map and sequence data, as the emphasis shifts to the production of "sequence-ready" maps.

*Supported by the U.S. Department of Energy (DE-FC02-9ER6130), the U.S. National Institutes of Health, and the Science and Technology Agency of Japan, with additional support from the Medical Research Council of the United Kingdom, the INSERM of France, and the European Union.


Abstracts scanned from text submitted for January 1996 DOE Human Genome Program Contractor-Grantee Workshop.

Return to Table of Contents