The Genome Sequence Database (GSDB): Meeting the Challenge of Genome-scale Sequencing*

Gifford Keen, Jillian Burton, David Crowley, Emily Dickinson, Ada Espinosa-Lujan, Ed Franks, Carol Harger, Mo Manning, Shelley March, Mia McLeod, John O'Neill, Alicia Power**, Maria Pumilia, David Rider, John Rorlich, Jolene Schwertferger, Linda Smyth, Nina Thayer, Charles Troup, and Chris Fields

National Center for Genome Resources, 1800 Old Pecos Trail, Santa Fe, NM 87505 USA

The Genome Sequence DataBase (GSDB) is a complete, public relational database of DNA sequences and annotation maintained by the National Center for Genome Resources (NCGR). GSDB provides direct, client-server access to the data for data contributions, community annotation, and SQL queries. A multiplatform graphic user interface, the GSDB Annotator, is freely available. Automatically-updated relational replicates of GSDB are also freely available.

GSDB is designed to meet the requirements for a community sequence database outlined by Waterman et al.[1]. GSDB supports complex, ad hoc queries in a standard language, SQL[2]. GSDB represents sequence data produced by any strategy, and supports the contribution of additional sequence data or structural or functional annotation by multiple researchers. GSDB extends the Electronic Data Publishing paradigm[3], in which the database is viewed as a primary publication for data not appearing in the traditional literature, to a model in which the database serves as a multi-user laboratory database for the entire molecular biology community. Multiple sequences from a given region and structural and functional annotations on sequences are viewed, in this model, as independent observations, with authors and unique identifiers. GSDB provides multi-user editing capabilities, with the necessary authorship, data security, integrity-checking, and versioning mechanisms needed to ensure that multiple authors do not overwrite each other's work. A mechanism is also provided for individuals or groups to define their own curated views of the data, which include whatever sequences and features they select. Database users may choose to access the entire database, or only the view maintained by an editorial group that imposes particular standards or selects data relevant to a particular set of interests. GSDB functions, therefore, both as a community laboratory database, and as a collection of multiple, virtual, specialty databases.

Information about GSDB and data input and output tools are available at http://www.ncgr.org.

*Supported by Cooperative Agreement 95ER62062 with the U.S. Department of Energy, Office of Health and Environment Research.

[1] Waterman, M. et al., J. Computational Biology 1, 173 - 190 (1994).

[2] U.S. Department of Commerce, Federal Information Processing Standard Publication 127-2: Database Language SQL. National Institute of Standards and Technology (1993).

[3] Cinkosky, M. et al., Science 252, 1273 - 1277 (1991).


Abstracts scanned from text submitted for January 1996 DOE Human Genome Program Contractor-Grantee Workshop.

Return to Table of Contents