|Genome Informatics Section
DOE Human Genome Program Contractor-Grantee Workshop
117. The Genome Sequence DataBase (GSDB): Advances in Data Access, Analysis, and Quality
C.A. Harger, M. Booker, A. Farmer,
W. Huang, J. Inman, D. Kipart, C Kodira, S. Root, F. Schilkey, J. Schwertfeger,
A. Siepel, M.P. Skupski, D. Stamper, N. Thayer, R. Thompson, J. Wortman,
J.J. Zhuang, and M.M. Harpold
Two primary foci of GSDB (www.ncgr.org/gsdb) located at the National Center for Genome Resources (NCGR), in Santa Fe, NM, are to expand the data access and analysis capabilities that are provided to researchers and to continue to improve and automate data quality assurance procedures. Substantial progress in both of these areas has been made during the last 18 months.
Recently NCGR has launched two data utilization tools which provide significant enhancements in data access and analysis capabilities. First, NCGR has begun implementation of sequence similarity searching by making the BLAST suite of algorithms available for researchers to search sequences in GSDB. The addition of sequence similarity searching complements the gene localization capabilities, e.g., MarFinder, already provided by NCGR. NCGR is planning to expand this analysis capability by making Frame Search, Clustalw, and Smith-Waterman publicly available.
Second, NCGR has introduced Sequence Viewer, a platform-independent graphical viewer for sequence data in GSDB. This tool provides easy visualization of sequence and associated annotation together with simple text presentation of non-graphical data. The benefits of Sequence Viewer are augmented by its integration with other GSDB data access tools, such as Maestro, a web-based database query tool. The availability of Sequence Viewer provides a significant improvement in the ability to retrieve and review sequences and associated annotation from GSDB.
During the last year NCGR has also made important advances in data quality assurance procedures. First, NCGR has improved the suite of programs that automatically acquire data from the International Nucleotide Sequence Database Collaboration (IC) databases. These improvements have resulted in a significant reduction of the amount of manual curation necessary to ensure quality and completeness of data acquired from the IC. Second, NCGR has implemented daily curatation of several database fields, including source molecule, chromosome, and the taxonomic information. The increased data consistency resulting from these efforts allows NCGR to provide researchers with flexibility in selecting BLAST search sets. For example, these search sets could range from the entire database to a variety of taxonomic-based subsets or to individual human chromosome sets.
These enhancements and improvements are designed to make GSDB more accessible to researchers, extend the rich searching capability already present in GSDB, and to facilitate the integration of sequence data with additional types of biological data.
|Author Index||Sequencing Technologies||Microbial Genome Program|
|Search||Mapping||Ethical, Legal, & Social Issues|