Data Acquisition and Curation in the Genome Data Base V6*

Michael A. Chipperfield, Christopher J. Porter, and C. Conover Talbot, Jr.

Division of Biomedical Information Sciences, Johns Hopkins University School of Medicine, Baltimore MD 21205-2236

The release of GDB version 6.0 heralds a significant change in the means by which data enter the database, thereby redefining the role of the Data Acquisition and Curation group. Heretofore, data acquisition and entry formed a major part of the activities of the Data group. The vast majority of GDB data, whether submitted by researchers or acquired from other sources, has been entered into the database by GDB staff.

GDB 6.0 opens editorial access to anyone from the wider genome community who requests it. Researchers can themselves enter and modify their own data, and annotate data submitted by others. The focus of the Data group can now move from acquisition to curation.The group will monitor the incoming data to ensure data integrity and quality within the database.

It is vital that electronic bulk submission tools be available, since 90% of the data submitted to GDB currently arrive in electronic form. Improved bulk submission tools have been developed which better reflect the data in the new GDB schema and which load data through the Object Broker (OB) interface. These tools require the use of a new data submission format. In order to ease the transition to this format, tools will be available to translate GDB 5.x style submissions. This will allow submitters time to convert their local systems to the new format. Alternatively, small submissions can be entered directly through the World Wide Web (WWW) interface.

The new design of GDB expands the use of WWW links to the information stored in other databases. The Data group will explore and develop links to other databases such as GSDB, OMIM and protein databases, as well as links to chromosome- or gene-specific WWW pages. Although the WWW browsing interface is already known to the community, it will take time for them to become familiar with the editorial interface and to begin routinely entering data. We anticipate that some users will continue in the short term to submit their data to GDB for entry. Moreover, the submission of data to GDB is not considered equivalent to the publication of those data in peer-reviewed journals. For these reasons, the Data group will continue its journal scanning activities, while working with HUGO and journal editors to encourage direct submission to GDB.

*Supported by the U.S. Department of Energy (DE-FC02-9ER6130), the U.S. National Institutes of Health, and the Science and Technology Agency of Japan, with additional support from the Medical Research Council of the United Kingdom, the INSERM of France, and the European Union.


Abstracts scanned from text submitted for January 1996 DOE Human Genome Program Contractor-Grantee Workshop.

Return to Table of Contents