DOE Human Genome Program Contractor-Grantee
55. IMAGEne 3.0: Clustering All Sequences Obtained from I.M.A.G.E. Clones
Peg Folta, Tom Kuczmarski, Tim Harsch, and Christa Prange
Lawrence Livermore National Laboratory, Livermore, CA 94550
To date over 1.9 million sequences have been submitted to GenBank from the 2.9 million available I.M.A.G.E.1 clones. This number will increase sharply due to the new Mammalian Gene Collection2 project. To maximize the value of this information, the IMAGEne3 product has been extended to group the human sequences into clusters that represent both known genes and "candidate genes". For known genes, clustering eliminates redundancy by providing the best representative clone for a gene. For clusters not associated with a known gene, the results provide evidence of a possible gene discovery.
IMAGEne was first released to the public in 4/98 to provide the user community with known gene clusters of I.M.A.G.E. clones. Since then the product has undergone significant enhancements, including use of NCBI's RefSeq to base the known gene set, indication of sequence verified clones, repeat masking, enhanced error checking, and faster response times. Version 3.0 is the largest enhancement, which extends the functionality by forming clusters on clones not associated with known genes.
Clusters are formed by sequence similarity, clone membership, and internal I.M.A.G.E. project knowledge. The user can query the resulting cluster database and view the cluster members, ranked primarily by size, in a user-friendly Java-based display. Currently I.M.A.G.E. has clone representatives for 93% of the known genes. It defines 61,083 multi-member candidate gene clusters and over 236,000 singletons. By the conference date, IMAGEne 3.0 will be publicly available on the web.
This work was performed by LLNL under the auspices of U.S. DOE, Contract No. W-7405-Eng-48.
|The online presentation of this publication is a special feature of the Human Genome Project Information Web site.|