A Database System to Support Genome Mapping

Mark Graves

Dept. of Cell Biology, Baylor College of Medicine, Houston, TX 77030.

We have developed a small database system to support physical mapping in genome research laboratories. A database system consists of a collection of data (called a database) and software which manipulates the data (called a database management system). To support genome mapping we have included other software which simplifies the development of genome databases. The software includes: a schema design tool, a data entry tool, and a query-by-example tool. Additional mapping-specific tools have been developed, including a tool to record the result of filter hybridization experiments.

Our database management system is based on the storage of graphs as binary relationships. A graph is a collection of vertices, edges, and labels which form binary relationships. A binary relationship is used to describe one attribute of a entity, such as the name of a person. An advantage of binary relationships is that they simplify the representation of data to a form which is easier to manipulate. Binary relationships are stored and retrieved using our graph database management system [1,2,3].

A database schema is a description of the data in a database. The database developer uses our schema design tool to draw the types of data to be included in the database. The schema is a collection of graph templates, and each template defines the binary relationships which are to be created for each data type. For example, a "person" graph template might create binary relationships for the name, phone number, and E-mail address of a person.

Our data entry tool uses the graph templates to create a graphical user interface in which the user enters data. The data entry forms are created automatically from the templates in the schema, and if the schema changes the data entry tools change, too. The automatic creation of data entry forms allows the database to be rapidly developed and to be modified when the laboratory process changes: something which is common in the rapidly changing field of genetics.

Query-by-example is a database query paradigm where the user enters a query by specifying part of the data in a template, and the query tool fills in the template based on the data in the database. Our query-by-example tool creates a graphical user interface for each graph template, as is done in the data entry tool. The user enters data into part of the template, and the system generates a report of all the data which matches the partially filled in template.

These three tools form the core of our system, but additional tools can be added to simplify part of the process. One tool is to record filter hybridizations using a graphic display. Instead of using the Data Entry Tool to enter data textually about hybridizations, the user can enter hybridization data using a graphical display corresponding to the filter. This is currently being used at the Baylor College of Medicine Human Genome Center to record cosmid filter hybridization experiments for human chromosome X and 17.

*Supported by a DOE Human Genome Distinguished Postdoctoral Fellowship.

1. M. Graves, E.R. Bergeman, C.B. Lawrence. "A Graph-Theoretic Data Model for Genome Mapping Databases". In Proceedings of the Twenty-Sixth Annual Hawaii International Conference on System Sciences, Vol 5, pp. 32-41. IEEE Press. January, 1995.

2. M. Graves, E.R. Bergeman, C.B. Lawrence. "Graph Database Systems". IEEE Engineering in Medicine and Biology special issue on Genomics. November, 1995.

3. M. Graves, E.R. Bergeman, C.B. Lawrence. "Querying a Genome Database Using Graphs". In Proceedings of The Third International Conference on Bioinformatics and Genome Research. H.A. Lim and C.R. Cantor, eds. World Scientific Publishing Co., Singapore. 1995. (in press)


Abstracts scanned from text submitted for January 1996 DOE Human Genome Program Contractor-Grantee Workshop.

Return to Table of Contents