|Genome Informatics Section
DOE Human Genome Program Contractor-Grantee Workshop
|78. The Genome Annotation Collaboration:
Jay R. Snoddy, Morey Parang, Sergey Petrov,
Richard Mural, Manesh Shah, Ying Xu, Sheryl Martin, Phil LoCascio, Kim
Worley1, Manfred Zorn2, Sylvia Spengler2,
Donn Davy2, Chris Overton3, Edward C. Uberbacher,
and the Genome Annotation Consortium
The Genome Annotation Consortium is organizing software and database development projects toward a common goal of providing as much value-added annotation as possible on a genome sequence framework. The consortium is applying computational analysis modules and information technologies to the output of genome sequencers. We have developed a prototype system and process that will be presented at the Oakland workshop. We are also interested in forging new collaborations to add value to the genome sequence and annotation framework. Desired collaborations should improve the analysis process or the underlying technologies that are required for this analysis. This basic annotation process includes the following steps:
1. Acquisition of genome sequence data
and other data that can be readily attached to genome sequences;
The outputs of our desired process include:
1. An assembled genome sequence framework;
Our current prototype is being applied to the output of all the large-scale genome sequencing centers for human sequences. We are adding genome mouse and microbial sequences to our prototype (see abstract of Larimer et al. for microbial analysis). As part of the initial prototype, we have established a data-acquisition component that retrieves data from genome center web sites and GenBank. This acquired data, for example, includes clone-contig overlap that is not always in the GenBank/EMBL/DDBJ entry. We have established a sequence-assembly component that creates a consensus genome sequence framework by assembling the different clone sequences. In addition, we acquire other experimental observations that can be linked to that genome-sequence framework during annotation (e.g., ESTs, STSs, cDNAs).
We have developed a number of analysis modules, including GRAIL-EXP modules (see abstract of Xu et al.). We have integrated these analysis modules in a data-analysis process that creates a comprehensive genome-wide analysis (see abstract of Shah et al.). This comprehensive analysis process will be updated to ensure that new data can be added to the genome sequence framework. We have made progress in adding navigation and summary reports (see abstract of Snoddy et al.).
We also have made progress on the difficult issue of data storage and management that can organize this diverse experimental and computational data (see abstract by Petrov et al.). We have produced different catalogs of genes and proteins including (1) GenBank annotated genes, (2) Genscan-predicted genes, and (3) GRAIL-EXP-predicted genes (including a subset of genes that have some EST evidence for expression). We have produced a Java-based interface (the Genome Channel Browser v. 2.0) and an HTML-based data-access method. These interfaces, other planned interfaces, and other progress will be presented at the Oakland meeting.
The analysis modules used in the comprehensive genome-analysis processes also will be available as public servers (see abstract of LoCascio et al.). These servers would permit users to analyze their new data or subsets of public data. Some of these analysis modules also will be portable and could be applied at a number of sites beyond the consortium member sites, including genome centers. We expect that our data-analysis process and computational infrastructure will also foster other genome-based, large-scale computational biology, including prediction of protein structure and modeling of biological systems.
|Author Index||Sequencing Technologies||Microbial Genome Program|
|Search||Mapping||Ethical, Legal, & Social Issues|