Amy K. Voltz and Kenneth H. Fasman
Division of Biomedical Information Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205-2236
The World Wide Web (WWW, the "web") provides a user-friendly interface for finding and retrieving genomic data in the Genome Data Base (GDB). Current web access via the "GDB Browser" allows the user to access all data objects in GDB, but does not emphasize data that may be most relevant to biologists interested in specific information regarding genes.
We have developed a new interface which directs the user to particular information about genes. This interface has been designed with specific questions in mind, the answers to which can now be easily found in the database: Is this gene in GDB? Where is the gene located in the genome, and what other genes or markers are located nearby? Are there PCR primers or genomic/cDNA clones for this gene? Who else has information about this gene?
These questions can be asked and answered with ease using the new interface, which returns the requested information in concise, formatted tables. The user also has the option of viewing detailed information for any object, by following a highlighted link from this table via GDB accession number.
In addition to the human gene mapping information in GDB, there are many databases currently in existence that store information of relevance to biologists. These include databases of nucleotide sequences (GSDB, GenBank, EMBL, DDBJ), human genetic disease (OMIM), protein sequence and structure (Swiss-Prot, PIR, PDB), and the genomes of model organisms (MGD, FlyBase).
Although the Internet and World Wide Web provide access to many of these databases and include some links that enable users to move from one database to another, most of these data must be gathered by contacting multiple databases. The databases do not provide an integrated view of biological knowledge -- rather the information is presented as individual entries in each database.
We have developed a prototype system for the integration of biological information. Entries in the Gene Family Database (http://gdbdoc.gdb.org/~avoltz/home.html) begin with a definition of a gene family and descriptions of its members, including links to the databases that compile proteins of similar sequence and motif into functional families (PRODOM, PROSITE, BLOCKS). The entries also include data on map location, nucleotide sequence, gene structure and function, RNA transcripts, protein sequence, structure and expression. There are also hypertext links to information regarding model organisms and human genetic disease.
Future entries are being developed with collaborators, who can serve to analyze the database information for accuracy, but more importantly will help to develop the content and presentation of the information in the Gene Family Database. We are working to develop a model for community-based curation of such a resource.
*Supported in part by the National Institutes of Health, National Research Service Award number IF32 HG00148-01 trom the National Center for Human Genome Research, and Department of Energy award number DEFC02-9 1 ER6 1230.