TRANSCRIPTOME 2002: From Functional Genomics to Systems
Comparative Genomics Tools: TIGR Gene Indices and TIGR Orthologous Gene Alignments (TOGA)
Yuandan Lee, Jennifer Tsai, Foo Chung, Svetlana Karamycheva, Babak Parvizi, Geo Pertea, Razvan Sultana, Valentine Antonescu, Joseph White, and John Quackenbush. The Institute for Genomic Research, Rockville, MD
While drafts of human genome sequences have been published recently and other eukaryotic genome sequencing projects are advancing rapidly, identification and classification of gene sequences remains a significant challenge because of the lack of experimental evidence and the shortcomings of the available gene prediction programs. EST sequencing and analysis remains a primary research tool for the identification and categorization of transcribed genes in a wide variety of species and an important resource for the annotation of genomic sequences. The ESTs, known genes, predicted gene transcripts, and available mapping and sequencing data have been integrated together to build a cross-genome reference database, the TIGR Gene Indices (TGI; <http://www.tigr.org/tdb/tgi.shtml>). TIGR gene indices are a collection of species-specific databases that use a highly refined protocol to analyze EST sequences in an attempt to identify the genes represented within the EST data and to provide structural and functional information regarding those genes. Gene Indices are constructed by first comparing, clustering, then assembling EST and annotated gene sequences (ET) from GenBank for the targeted species. This process produces a set of unique, high-fidelity virtual transcripts, or Tentative Consensus (TC) sequences. There are 49 gene indices with over 500,000 TCs available now for major economically and biologically important eukaryotic organisms including human, mouse, rat, cattle, pig, Arabidopsis, rice, soybean, yeast, Drosophila, Xenopus, zebrafish, C. elegans, Plasmodium falciparum etc. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to mapping and genomic sequence data, to provide links between orthologous and paralogous genes, and as a resource for comparative sequence analysis. In particular, we have used comparisons between TC sequences from 49 gene indices to construct the TIGR Orthologous Gene Alignment (TOGA) database. TOGA uses a transitive association algorithm to link together tentative orthologues and has allowed the identification of more than 30,000 potential orthologous groups in eukaryotes. This data provides a unique opportunity for phylogenetic and functional analysis of genes, and for the annotation of genes used in microarray analysis.
Return to Table of Contents * Speaker Abstracts * Poster Abstracts * View the Photos
Return to Meetings Home Page
This site produced by the Human Genome Management Information System of Oak Ridge National Laboratory.