DOE Human Genome Program Contractor-Grantee
59. Annotation of Draft Genomic Sequence Generated at the JGI
Richard Mural1, Miriam Land1, Frank Larimer1, Morey Parang1, Manesh Shah1, Doug Hyatt1, Ed Uberbacher1, P. Folta2, T. Bobo2, Zhengping Huang2, and T. Slezak2
1Computational Biosciences and Toxicology and Risk Analysis Sections, Life Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Oak Ridge, TN 37831 and 2Human Genome Center, Lawrence Livermore National Laboratory, Livermore, CA
The JGI is a major player in the effort to complete a 90% draft of the sequence of the human genome within the next few months. Draft sequence poses special problems for the annotation process, however it is clear that a 3 to 5X coverage of genomic DNA can yield large amounts of biologically meaningful data if the appropriate analysis methods can be applied. There are a number of features that can be located and annotated in draft-sequence which are useful for further analysis, these include: STS's, BAC ends (STCs), and EST's. These features can be annotated by standard similarity methods given sufficient computational resources. Using various gene identification programs, particularly those that incorporate similarity data such as Grail-Exp, which can use both EST and complete cDNA data, provide another level to the analysis of draft data. These analyses allow not only gene identification but it can also provide some ordering information for contigs that make up the clone being analyzed. Also recall that essentially all of the genes that can be found in finished sequence can be identified in draft sequence at about 3X coverage.
To help add biologically valuable information to the draft sequence being generated at the JGI/PSF a configurable analysis pipeline has been developed to provide analysis of draft data. Draft data produced at the JGI/PSF is analyzed and the analysis results are parsed into the JGI database. The initial annotation of draft sequence is a catalog of the clone contents (STS's, STC's, genes models predicted by Grail-Exp and Genscan, as well as Blast searches of their translations against the NR protein database) which are provided in a tabular form which is accessible from the JGI web page. Further analysis of this information will help to define relationships among draft clones and will allow ordering, within and between clones.
To date we have analyzed over 1500 draft clones from human chromosomes 5, 16 and 19. The results of these analyses can be viewed at: www.jgi.doe.gov/Data/JGI_finished.html.
(Research sponsored by the Office of Biological and Environmental Research, USDOE under contract number DE-AC05-96OR22464 with Lockheed Martin Energy Research Corp.)
|The online presentation of this publication is a special feature of the Human Genome Project Information Web site.|