Introduction to the Workshop
URLs Provided by Attendees
- Ethical, Legal, and Social Issues
The electronic form of this document may be cited in the following style:
Human Genome Program, U.S. Department of Energy, DOE Human Genome Program Contractor-Grantee Workshop IV, 1994.
Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected.
Data Acquisition and Curation Operations for the GDB Human Genome Data Base
A. Jamie Cuticchia, Michael A. Chipperfield, Christopher J. Porter, and C. Conover Talbot Jr.
Division of Biomedical Information Sciences, Johns Hopkins University School of Medicine, 2024 E. Monument St., Baltimore MD 21205-2100
Just as the techniques for the discovery of mapping information have evolved with the Human Genome Project, so have the methods for acquiring and curating such data. Historically, the responsibility for collating and presenting the data on the human gene map rested with the HUGO chromosome committees and the Human Gene Mapping Workshops. It became clear that as the Human Genome Project grew it would not be possible to consolidate the data on an annual basis. It would rest upon the public mapping databases to continuously collect and curate these data.
The data acquisition strategy of the Genome Data Base has evolved from one of assisting the community in the preparation of data submissions (usually on paper forms) to one which now includes actively scanning the scientific literature for data which were not previously submitted to the database. Since 1992 when the first system of electronic forms was introduced to the community, 77% of the data collected by the Genome Data Base has arrived electronically. However, nearly 95% of all submissions continue to utilize the paper submission forms. One reason for this is the nontrivial investment of resources presently needed to complete an electronic submission. Nearly all electronic submissions have been the product of laboratory databases which produce reports directly in the format of the electronic submission forms. We continue to work with collaborators such as Manfred Zorn of the Lawrence Berkeley Laboratory to produce software so that researchers can submit their data in electronic form jointly to the Genome Data Base and Genome Sequence DataBase. With the increase in the number of tools provided for the creation of electronic submissions, fewer resources should be expended in collecting data from anonymous FTP (file transfer protocol) sites and paper forms.
The HUGO chromosome committees will continue to be responsible for defining the consensus localizations for markers and the consensus maps for each chromosome, as well as oversee human gene nomenclature. To assist them in their interaction with the database, several organizations (the U.K. Medical Research Council, the U.S. Department of Energy, France's INSERM, and the European Union) have funded editorial assistant positions. These individuals assist the Genome Data Base by working with the HUGO editors in their geographic region to update and maintain the database. Additionally, several of the editorial assistants play a role in the scanning of the scientific literature, allowing the Genome Data Base to keep pace with the growing amount of data in the literature. It is hoped that the need for literature scanning will be greatly reduced as the community moves to require researchers to submit mapping data to the database, as they already have for sequence data.