Nomi L. Harris and Frank H. Eeckman
Human Genome Informatics Group, Lawrence Berkeley National Laboratory, MS 46A/1123, Berkeley CA 94720; nlharris@lbl.gov
Sequencing centers such as the Human Genome Center at LBNL are producing an ever increasing flood of genetic data. Annotation can greatly enhance the biological value of these sequences. Useful annotations include homologies to known genes, possible gene locations, gene signals such as promoters, etc.
We are developing a workbench for automatic sequence annotation and annotation viewing and editing. The goal is to run all available sequence analysis tools and display the results in such a way that the various predictions can be compared. Researchers will then be able to examine all of the annotations (for example, the genes predicted by various gene-finding methods) and select the ones that look the best.
Our current prototype annotation workbench automatically runs the following sequence analysis tools:
Homology Searches
Gene Finding
Promoter Prediction
The resulting predictions are filtered and saved in simple data formats such as .ace format. Other sequences analysis tools can also be incorporated.
The choice of sequence analysis programs is orthogonal to the front end used to view them. We have developed a prototype annotation browser based on the bioTkperl map display widget written by David Searls and Gregg Helt. Color-coded sequence annotations for both strands are displayed on a canvas that can be scrolled and zoomed. Clicking on an annotation displays additional information about it.
Planned extensions to ACEDB will enable it to serve as an alternative annotation browser.
In order to test our sequence analysis environment, we used it to study the HUM14SP6 region of human chromosome 5 (5q31), which was sequenced at LBNL. (In another abstract[1] at this meeting, the biology group will present their findings on a larger section of 5q31.) Although the 22Kb HUM14SP6 segment belongs to a region that contains many interleukin genes, no genes have yet been identified in HUM14SP6.
HUM14SP6 was found (by BLAST) to have 552 hits (significant regions of homology) with ESTs and 445 hits with sequences in a non-redundant amino acid database (NRDB), many of which were similar or identical to the EST hits. 47 homologies with human repeat sequences covered roughly half of the EST/NRDB hits.
The three gene-finding programs we ran each found several possible exons in the complementary strand (Genefinder and GRAIL also found one possible exon in the forward strand). Half of GRAIL's 12 predicted exons overlapped with blast hits, as did all but two of Genefinder's 10 predicted exons. Most of xpound's predictions echoed GRAIL's. The promoter predictor found numerous possible promoters in both strands.
*This work was supported by the U.S. Department of Energy under Contract Number DE-AC03-76SF00098
[1] Kelly A. Frazer, Yukihiko Uedo, Maria R. Garofalo, Jan-Fan Cheng, and Edward M. Rubin, "Computational and biological analysis of 1.2 Mb of sequence at 5q3, " this meeting.