|Microbial Genome Program Abstracts
DOE Human Genome Program
|Author Index||Sequencing Technologies||Microbial Genome Index|
|Search||Mapping||Ethical, Legal, & Social Issues|
|150. Archaeal Proteomics
Carol S. Giometti1, Sandra
L. Tollaksen1, Xiaoli Liang1 , Michael W. W. Adams2,
James F. Holden2, Angeli Menon2, Gerti Schut2,
Claudia I. Reich3, Gary J. Olsen3, and John Yates,
The genomes of several Archaea are either partially or completely sequenced, revealing the presumptive sequences of encoded proteins. However, the functions of these proteins can only be inferred from sequence similarity with known proteins, and the mechanisms by which the expression and function of most of the proteins are regulated remain unknown. The goal of the Archaeal Proteomics Project is to identify archaeal proteins and regulatory pathways relevant to bioremediation and energy technology, processes of interest to the U.S. Department of Energy. We are using two-dimensional gel electrophoresis (2DE) to purify and quantitate proteins expressed in Archaea grown under a variety of conditions designed to modulate specific metabolic pathways. The compartmentalization of Archaeal proteins is being determined by 2DE of subcellular fractions. Proteins are identified on the basis of similarities between observed peptide masses for tryptic digests generated from proteins in the 2DE gels and calculated peptide masses for the proteins encoded in the genome sequences. We are obtaining peptide masses by using matrix-assisted laser desorption ionization mass spectrometry. Initial work is focused on the proteomes of Pyrococcus furiosus and Methanococcus jannaschii, both hyperthermophilic Archaea with growth temperatures near 100 oC and enzymatic capabilities that promise to be of value in bioremediation reactions, energy conversion, and chemical processing systems. Whereas many of the enzymatic activities associated with primary metabolic pathways have been characterized in P. furiosus, the metabolic capabilities of M. jannaschii have only been inferred from gene sequence information. Thus far, the most abundant proteins in the 2DE patterns of M. jannaschii and P. furiosus lysates have been identified using peptide mass searches, membrane and cytosolic proteins from P. furiosus have been compared, and quantitative changes in M. jannaschii proteins under several different growth conditions have been analyzed. These preliminary results are the foundation for the M. jannaschii and P. furiosus proteome databases. This work is support by the U.S. Department of Energy, Office of Biological and Environmental Research, under Contract No. W-31-109-Eng-38.
The submitted manuscript has been created by the University of Chicago as Operator of Argonne National Laboratory ('Argonne') under Contract No. W-31-109-ENG-38 with the U.S. Department of Energy. The U.S. Government retains for itself, and others acting on its behalf, a paid-up, nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.
151. Microbial Genome Sequencing and Analysis at TIGR
William C. Nierman, Tamara Feldblyum,
Rebecca A. Clayton, Robert D. Fleischmann, Owen White, Claire M. Fraser,
and J. Craig Venter
Advances in automated DNA sequence analyses and a whole genome shotgun strategy pioneered by The Institute for Genomic Research resulted in the first complete genome sequence for a free living bacterium, Haemophilus influenzae in 1995. Since then the TIGR microbial sequencing program has expanded to include 8 completed bacterial genomes representing 12.3 Mb of sequence data and another 16 genomes in progress. Data analyses of the sequenced organisms indicates that on the average about 46% of the genes have no assigned biological function and 26% of those genes are unique to a particular species. The number of genes with unknown function and the number of genes unique to an organism indicate the wide variety and adaptability of the bacterial world, their ability to adjust to extreme living conditions, their ability to metabolize a variety of chemical compounds as sources of energy, and our rudimentary knowledge of the scope of the physiology and metabolism of earth's microbes.
The availability of complete microbial genome sequence data opens new ways of investigating these organisms. Analysis of TIGR sequenced microbial genomes has provided new and exciting insights into the phylogenetic relatedness of organisms, novel metabolic pathways, biochemical strategies of pathogenic microbes, the functional identification of genes, and the minimal gene content of free living organisms.
152. Genomics and Engineering of a Radioresistant Bacterium
Kenneth W. Minton, Kira S. Makarova,
Michael J. Daly, Eugene V. Koonin, Hassan Brim, L. Aravind, and Ajay Sharma
The eubacterium Deinococcus radiodurans is the most DNA damage-resistant organism discovered to date. It is therefore of intrinsic interest to study its DNA repair mechanisms, and towards this end the full genomic sequence of this organism has recently been obtained by TIGR. We have fully annotated this sequence with special attention to properties that might render this organism radioresistant. Features noted to date include a novel enzyme, combining potential repair domains from three independent repair proteins. This gene is currently being knocked out of the deinococcal genome and properties of the null mutant will be reported. Similarly, desiccation-resistance proteins similar to those seen in plants have also been discovered in the Deinococcus radiodurans genome. This is of particular significance, as there is a known positive correlation between deinococcal desiccation-resistance and radioresistance. The properties of knock out mutants will be reported.
Finally, an expansion of several protein families, including phosphatases, proteases, acyl transferases, the mutT family of pyrophosphatases, and thioredoxins have been noted. Deinococcus radiodurans' genome is extraordinarily rich in repeated sequences, suggesting a mechanism of repair that will be presented. In addition, it is the first bacterium to be sequenced that has multiple chromosomes (three). Engineering of this versatile organism for organopollutant degradation in radioactive mixed waste environments and engineering of heavy metal resistance in this organism will be described. Finally, current attempts to acquire large amounts and crystallize Deinococcus' extraordinary and highly toxic RecA protein will be described.
153. Functional Analysis of Deinococcus radiodurans Genomes by Targeted Mutagenesis
Kwong-Kwok Wong, William B. Chrisler,
Lye Meng Markillie, and Richard D. Smith
D. radiodurans, previously known as Micrococcus radiodurans, strains R1, has extreme resistance to genotoxic chemicals, oxidative damage, high levels of ionizing and UV radiation, and desiccation. The ability to survive such extreme environments is attributed in part to a unique DNA repair system in combination with its chromosome copy number and structure, as well as factors affecting the survival of other cellular components. There is evidence suggesting that the carotenoids which cause red pigmentation in D. radiodurans may act as free radical scavengers, thus increasing resistance to DNA damage by hydroxyl radicals. High levels of two oxygen toxicity defense enzymes, superoxide dismutase and catalase, are also found in D. radiodurans. In addition, the Deinococcal outer membrane lipids are complex and distinct from those found in the rest of the bacterial world and it has been suggested that they, together with the plasma membrane, may also be involved in stress resistance. However, the genetic basis for these stress resistance is still not clear. With the genomic sequence information of D. radiodurans R1, we have developed a simple and general targeted mutagenesis method to perform a genome-wide analysis of putative genes involved in the stress resistance. We have generated mutations in katA (catalase) and sodA (superoxide dismutase). Both katA and sodA mutants are shown to be required for the extreme ionizing radiation resistance. Several other mutations have been generated and are being analyzed for their roles in stress resistance.
154. Complete Genome Sequence of Deinococcus radiodurans
Owen White, John Heidelburg, Claire
Fraser, and J. Craig Venter
Deinococcus radiodurans is a non-pathogenic, non-sporulating, red-pigmented Gram+ bacterium. D. radiodurans was originally found in radiation sterilized food that under went spoilage. It is remarkable in that it is the most radioresistant organism to have ever been isolated (Moseley, 1983). An important component of this resistance is the ability to repair damage to chromosomal DNA. D. radiodurans cultures exposed to 1.5 Mrad of radiation displayed reduction in size of genomic DNA fragments corresponding to approximately 100 double stranded breaks (DSBs) per genome. (Typically, most prokaryotic and eukaryotic organisms cannot tolerate more than 5 double stranded breaks per genome without reduced survival.) Remarkably, within eight to ten hours after exposure, D. radiodurans genomic fragment lengths are restored to size ranges seen in non-treated cells. During this repair time, cellular replication of D. radiodurans is arrested (Daly et al., 1994); however, after this eight to ten hour interval, the cells display 100% survival with no detectable mutagenesis of their completely restored genomes. The genome sequence of Deinococcus is complete and we have determined the genome is composed of 3 chromosomes and a small plasmid; a number of unique sequence elements have been identified. The content of the genome, along experimental results will be discussed in context of this organism's unique ability to withstand gamma radiation.
155. Complete Genome Sequencing of Shewanella putrefaciens
Rebecca A. Clayton, John Heidelberg,
Kenneth Nealson, Eric Gaidos, Alexandre I. Tsapin, James Scott, J. Craig
Venter, and Claire M. Fraser
We are midway through the closure process in the complete genome sequencing of Shewanella putrefaciens. Random sequencing was completed in July, 1998, and closure began in August, 1998. We present preliminary annotation of the genome, with whole genome comparisons with other completed microbial genomes. Shewanella putrefaciens has high 16S rRNA sequence similarity to Escherichia coli, and also to Vibrio cholerae, a genome in the final stages of closure at TIGR. approximately half its open reading frames have high sequence similarity to E. coli. About 15% of the S.putrefaciens genome has high sequence similarity with V. cholerae but not with E. coli K12. Analysis of the assemblies suggests that the completed genome size will be approximately 5 Mb.
156. Whole Genome Sequence and Structural Proteomics of Pyrobaculum aerophilum
Sorel Fitz-Gibbon1, Ung-Jin
Kim2, Heidi Ladner1, Elizabeth Conzevoy1,
Gigi Park2, Karl Stetter3, Jeffrey H. Miller1,
and Melvin I. Simon2
Pyrobaculum aerophilum is a hyperthermophilic archaeon, isolated from a boiling marine water hole, that is capable of growth at 104C. This microorganism can grow microaerobically, unlike most of its thermophilic relatives, making it amenable to a variety of experimental manipulations and a candidate as a model organism for studying archaeal and thermophilic microbiology. We have sequenced the entire genome using a random shotgun approach (3.5X genomic coverage) followed by oligonucleotide primer directed sequencing, guided by our fosmid map. The 2.2 Mb genome codes for more than 2000 proteins, 30% of which have been identified by their sequence similarities to proteins of known function. Only 15% of the Pyrobaculum aerophilum proteins have related high resolution structures. In collaboration with the DOE/UCLA Laboratories and Los Alamos National Laboratories we have initiated a project to express and purify proteins for structure determination by NMR or xray diffraction. The three dimensional structures of the Pyrobaculum aerophilum proteins will give one the power to understand and manipulate protein function and are crucial to fully exploiting the information in the genome. At this time several proteins have been cloned, expressed in E.coli, purified and crystals which diffract to high resolution have been obtained.
157. The Genome Sequence of a Hyperthermophilic Archaeon: Pyrococcus furiosus
Robert B. Weiss1, Diane
Dunn1, Mark Stump1, Raymond Yeh1, Joshua
Cherry1, and Frank T. Robb2
Pyrococcus furiosus is a strictly anaerobic archaeon that grows optimally at 100C by a fermentative-type metabolism in which complex peptide mixtures such as yeast extract and Tryptone, and also certain sugars, are oxidized to organic acids, H2 and CO2. The organism was isolated from geothermal marine sediment in shallow waters off Vulcano Island, Italy. We have determined the complete sequence of this organism's genome. It is 1,908,253 base pairs in length, with a GC-content of 40.8%. Recently, the complete sequence of a distantly-related species, Pyrococcus horikoshii, has been determined by a group in Japan (www.bio.nite.go.jp). This species was isolated from a hydrothermal vent at a depth of 1395 meters in the Sea of Japan. Comparative analysis is revealing complex gene re-arrangements and changes in gene content between these two Pyrococcus species.
The genome content and organization reveals many potential operons, one rRNA operon, 46 tRNAs, 22 insertion elements, 7 SR elements, and 14 inteins. 50% of the ORFs are of unknown function, and of this class 19% are in common between the two Pyrococcus species. The 22 putative insertion elements in the genome of P. furiosus are not found in the P. horikoshii genome.
The genome of P. furiosus is 170 kb larger than P. horikosihii, with about 70% DNA identity conserved within open reading frames. Genome to genome dot plot alignment reveals the remnant of a conserved diagonal. The longest co-linear segment between the two genomes is 70 kb, and much of the remnant diagonal is interspersed with inversions, deletions and insertions. The list of major gene clusters present in P. furiosus but not in P. horikoshii include: maltose/trehalose transport, phosphate uptake system, major parts of the urea and TCA cycle, and amino acid metabolism of tryptophan, aromatics, arginine, and isoleucine/valine. The maltose/trehalose transport operon is within a 17 kb segment flanked by putative insertion elements. This segment is also found in Thermococcus litoralis, another isolate from a Mediterrean marine geothermal location. The high identity (>99%) between these two segments suggests a recent lateral transfer event between T. litoralis and P. furiosus.
158. The Chlorobium tepidum Genome Sequencing Program at TIGR
Karen A. Ketchum, Matthew D. Cotton,
Cheryl Bowman, M. Brook Craven, Tanya Mason, Terrence Shea, William Nierman,
and Claire M. Fraser
The genus Chlorobium is placed in the taxonomic group of green sulfur bacteria (Chlorobiaceae). They are formally classified as Gram-negative organisms. Members of this genus are photoautotrophs that can generate chemical energy through an electron transport chain in the cytoplasmic membrane that is associated with a light-harvesting complex housed in a specialized organelle called the chlorosome. The components of this light-harvesting apparatus and some of its organizational structure are reminiscent of photosystems found in plant chloroplasts and, therefore, the evolutionary relationship of these prokaryotes to eukaryotic organelles is of interest. Chlorobium species can also fix CO2, although the biochemical pathway used by these prokaryotes is distinct from the Calvin cycle found in higher plants.
C. tepidum was initially identified from a hot spring in New Zealand (Wahlund et al. 1991). This species is thermophilic with and optimum growth temperature of @ 47C. It has a genome size of 2.1 Mb (Naterstad et al., 1995) with a G + C content of 56.5 mol%. C. tepidum was nominated for sequencing by the DOE because it has a prominent role in global carbon cycling and an interesting phylogenetic position in the Eubacterial kingdom.
The C. tepidum genome project was initiated in March of 1998. Genomic DNA was generously supplied by Dr. Donald A. Bryant, Earnest C. Pollard Professor of Biotechnology and Professor of Biochemistry and Molecular Biology at The Pennsylvania State University. Random sequencing of a small insert (1.6 - 2.5 kb) plasmid library began in May and is now complete. We obtained 32,246 sequencing reads for 8X coverage of the genome. The overall success rate was 82%. We are now sequencing a large insert lambda library which will provide linking information for our contigs. The current genome assembly has 41 groups with 38 sequencing gaps and 80 physical gaps. A progress report on the C. tepidum project will be presented.
159. Searching for Synteny: A Whole-Genome Comparison of Caenorhabditis elegans with Saccharomyces cerevisiae
Karen L. Diemer and Kelly A. Frazer
Characterizing the syntenic relationships of genes in different species has been a valuable tool for deciphering a variety of biological phenomenons. The completely sequenced yeast Saccharomyces cerevisiae and the extensively sequenced worm Caenorhabditis elegans genomes provide us with the opportunity to search on a whole-genome wide basis for conservation of gene order between these distantly related eukaryotic organisms. The yeast and worm genomes diverged approximately 965 million years ago (Doolittle et al. 1996), therefore any conservation of gene order is likely due to biological forces dictating genome organization rather than a lack of shuffling of genes that were neighbors in the last common ancestor. We compared protein translations of the 6221 yeast ORFs to the available worm sequence data (85% of total) to determine whether any paired genes, loci that are consecutive (neighbors) in both organisms, exist. Ten pairs of adjacent yeast ORFs were identified that have significant matches (TBLASTN expect values <1e-21) adjacent to each other in the worm genome. Four of these paired ORFs consist of genes encoding for different core histones, three consist of genes that encode for proteins of no known related function, and three consist of ORFs that are part of the same gene in yeast but had not yet been identified as such. These data indicate that the study of conserved gene pairs in distantly related eukaryotes may provide insights into the selective pressures governing the clustering of certain genes as well as serve to facilitate the assignment of putative ORFs into protein encoding units.
160. Microbial Genome Sequencing and Comparative Analysis
D.R. Smith, M. Ayers, R. Bashirzadeh,
H. Bochner, M. Boivin, G. Breton, S. Bross, A. Caron, A. Caruso, R. Cook,
P. Daggett, L. Doucette-Stamm, J. Dubois, J. Egan, D. Ellston, J. Ezedi,
T. Ho, K. Holtham, P. Joseph, M. LaPlante, H-M. Lee, R. Gibson, K. Gilbert,
J. Guerin, D. Harrison, J. Hitti, P. Keagle, J. Kozlovsky, G. LeBlanc,
W. Lumm, P. Mank, A. Majeski, J. Nölling, D. Patwell, J. Phillips,
B. Pothier, S. Prabhakar, D. Qiu, J.N. Reeve1,
M. Rossetti, M. Sachdeva, P. Snell, 2P.
Soucaille, L. Spitzer, R. Vicaire, K. Wall, Y. Wang, L. Wong, A. Wonsey,
K. Weinstock, Q. Xu, and L. Zhang
This project is applying automated sequencing technology and bioinformatics tools to the analysis of microbial genomes with potential applications in energy production and bioremediation., Efforts have focused on two genomes in particular, those of Methanobacterium thermoautotrophicum strain H, and Clostridum acetobutylicum strain ATCC 824.
Methanobacterium thermoautotrophicum strain H is a thermophilic archaeon that grows at temperatures from 40-70° C, and was isolated in 1971 from sewage sludge. The complete 1,751,377 bp sequence of the genome of M. thermoautotrophicum was determined by a whole genome shotgun sequencing approach. The results of extensive comparative and functional analysis work were published last year in the Journal of Bacteriology, Volume 179, 7135-7155.
C. acetobutylicum strain ATCC 824 has a 4.2 Mb, AT-rich genome, and is one of the best-studied solventogenic clostridia (it has been used commercially to produce acetone, butanol and ethanol). The shotgun sequencing phase has been completed, with 4.9 Mb of multiplex and 21.3 Mb of ABI raw sequence reads (6.3 fold total redundancy) that produced 551 contigs spanning 4,030,725 bases when assembled using PHRAP with quality scores. The genome has been finished to 27 ordered contigs, with quality enhancement, at the time of this writing.
Physical mapping of the C. acetobutylicum genome by P. Soucaille and coworkers (INSA, Toulose) has shown that this strain harbors a large plasmid, designated pSOL1, of about 210 kb in size. Further studies by the same group revealed that loss of this plasmid coincides with the loss of the capacity to produce acetone and butanol and that the genes involved in solvent formation reside on pSOL1. We now have the complete 203 kb sequence of this plasmid.
C. acetobutylicum contains a variety of genes involved in the utilization of polysaccharides such as starch, hemicelluose and cellulose. The potential to degrade cellulose, indicated by the presence of an entire set of genes predicted to code for a cellulose-hydolysing mutlienzyme complex termed cellulosome, is surprising as cellulolytic activity is unknown for C. acetobutylicum. In addition, a gene similar to the toxin A encoding gene from the pathogenic clostridium C. difficile is present in the non-pathogenic C. acetobutylicum, coding for a polypeptide of ~ 2800 residues the majority of which is organized in ~125 repeats of 20 amino acids each. The genome of C. acetobutylicum ATCC 824 seems to be nearly void of mobile genetic elements. Only a single copy of a transposase gene, belonging to the Tn3 family and located on plasmid pSOL1, could be identified. Two gene clusters of four genes each show similarity to bacteriophage-like elements. There are 11 ribosomal operons. The data are available in GenBank and on our Web page (http://www.genomecorp.com).
161. Genome Sequencing and Analysis
G. J. Olsen, C. I. Reich, N. C. Kyrpides,
J. H. Badger, D. E. Graham, P. J. Haney, L. K. McNeil, G. M. Colón
González, A. A. Best, B. P. Kaine, and C. R. Woese
Our work is directed toward the sequencing and interpretation of selected microbial genomes, and has several components.
To study the sequence basis of thermal adaptation, we have been comparing the proteins encoded in the genome of Methanococcus jannaschii (an extreme thermophile) with those of related mesophiles. We have documented specific amino acid changes correlated with the difference in organismal growth temperatures, as well as systematic changes in amino acid properties. These trends are recurring themes; they are observed in 82-93% of all compete protein sequences analyzed. To generate more data for this comparative analysis, we have prepared sequencing quality genomic DNA libraries from Methanococcus maripaludis, and have started partially sequencing clones from this library (this sequencing is supported by NASA).
To ensure the availability of data from key (diverse) eukaryotic microorganisms, we have prepared sequencing quality genomic DNA libraries for Giardia lamblia. These libraries are being used by Mitchell L. Sogin (Marine Biology Laboratory) to generate a nearly complete genome sequence for this organism (this sequencing is supported by the NIH). These data will be critical to understanding the origins of eukaryotes and their unique cellular organization.
The sequence data resulting from our participation in the Microbial Genome Initiative has stimulated additional research in our laboratories. Specifically:
1. We have continued to make new gene identifications in the sequenced genomes;
2. We have begun an experimental verification of the function of some novel RNA methylase genes;
3. We have cloned and expressed RNA polymerase genes and transcription initiation factors from the Archaea and have experimentally identified new protein-protein interactions in the transcription apparatus; and
4. We have entered into a collaboration with Carol Giometti (Argonne to National Laboratory) and Michael Adams (University of Georgia, Athens) study the proteomes of Methanococcus jannaschii and Pyrococcus furiosus.
162. Use of Suppressive Subtractive Hybridization to Identify Genomic Differences among Enteropathogenic Strains of Yersinia enterocolitica and Yersinia pseudotuberculosis
Lyndsay Radnedge, Peter Agron, Lisa
Glover, and Gary Andersen
A comparison of genomic sequences among closely related species is likely to reveal unique DNA regions that define the genetic basis for the underlying differences in their phenotypic variation. An example of two closely related human pathogens that differ in their ability to colonize animal hosts as well as their persistence in the environment are the enteropathogenic bacteria, Yersinia enterocolitica and Y. pseudotuberculosis. Of the two pathogens, Y. enterocolitica is more often associated with human infection, especially in day-care centers where the disease is transmitted through infected food, water or soil. Although less frequently diagnosed, infection with Y. pseudotuberculosis is most commonly transmitted through contact with infected birds or mammals. A large percentage of Y. pseudotuberculosis infections are subclinical with no observable symptoms in the exposed individuals. Unlike Y. enterocolitica, Y. pseudotuberculosis may enter the bloodstream of predisposed individuals, causing a lethal septicemia.
We used a PCR-based subtractive hybridization method termed suppressive subtractive hybridization (SSH) (Diatchenko et al., 1996. Proc. Natl. Acad. Sci. 93:6025-6030) to define the differences between the genomes of Y. enterocolitica and Y. pseudotuberculosis. This technique uses PCR amplification to enrich for unique segments of restricted DNA and simultaneously limits non-target amplification by suppression PCR. The bacterial genome of interest in this comparison is called the tester DNA and the comparison genome is called driver DNA. Using pair-wise comparisons among four strains of Y. enterocolitica and four strains of Y. pseudotuberculosis our initial aim was to identify tester-specific sequences in the type-strains of both species. Control subtractions yielded no PCR products, indicating that the protocol effectively subtracts identical DNA sequences. We have optimized the reaction conditions for the subtraction experiments. Subtracted DNAs were successfully cloned into pGEMT-Easy plasmid vector (Promega); almost 100% of resulting white colonies contained an insert. The clones so far characterized contain inserts that range in size from 200 bp to 1700 bp. The band size distribution of the cloned products represents the distribution of the amplified subtractive hybridization products. Plasmids containing an insert were sequenced on an ABI 377 using dye terminator chemistry.
BLAST searches of tester-specific DNA sequences reveal homologies to known bacterial genes (including genes involved in pathogenicity) and eleven novel DNA sequences. Included in the regions unique to the Y. enterocolitica type-strain is a difference product with homology to the response-regulator phoP, which has been associated with virulence in Salmonella typhimurium. 92% of oligonucleotide probes designed using these tester-specific DNA sequences distinguished genomic DNA isolated from Y. enterocolitica, while 100% of such probes were specific for Y. pseudotuberculosis. Furthermore, SSH has proved to be sensitive enough to design probes against tester-specific DNA sequence that have been shown to discriminate between genomic DNA isolated from two strains of Y. enterocolitica (58% of the probes successfully discriminate tester DNA).
Streamlining, automating the steps, and
increasing the throughput of this technique should enable large-scale genomic
comparison among closely related strains and the generation of strain-specific
oligonucleotide probes for molecular epidemiology studies.
Jizhong Zhou1, Douglas
Lies2, Gary Li1, Rebecca Clayton3, Kenneth
H. Nealson2, Claire Fraser3, James M. Tiedje4
The goal of this project is to explore whole genome sequence information to understand the genetic structure, functions, regulatory networks and mechanisms of dissimilatory metal reduction pathways. The following objectives will be pursued: (1) To identify the genes involved in dissimilatory metal reduction pathways in MR-1; (2) To generate and characterize deletion mutants for defining the functions of the unknown genes expressed under metal reducing conditions; (3) To understand the metabolic and genetic control of gene expression at the genome level under iron reducing conditions; and (4) To explore genetic diversity of the dissimilatory iron reduction pathways in selected thermophilic and psychrophilic iron-reducing bacteria. To achieve these objectives, we will construct microarrays consisting of all ORFs from MR-1, and use them to monitor gene expression patterns under different growth conditions for identifying the genes involved in dissimilatory metal reduction and for defining the putative functions of unknown ORFs. We will also use them to compare the gene expression patterns when MR-1 is shifted from aerobic to anaerobic iron-reducing conditions, and to compare the gene expression patterns between wild type and specific regulatory mutants for understanding the metabolic control and regulatory networks of iron reduction pathways. In addition, we will generate and characterize specific deletion mutants for defining the functions of unknown ORFs. Finally, we will use the microarrays to assay the genetic diversity of iron reduction pathways at the genomic level among representative thermophilic and psychrophilic iron-reducing bacteria.
To optimize the conditions for microarray hybridization, we are constructing prototype microarrays containing genes involved in anaerobic metabolisms to understand how these genes are regulated under anaerobic conditions. As a part of this project, nine psychrophilic iron-reducing bacteria have also been isolated from Siberia and Alaska permafrost soils, deep marine sediments and Hawaii deep sea water. These bacteria are also able to reduce cobalt, chromium at low temperature. Phylogenetical analysis showed that they are closely related to Shewanella and Vibrio species. In addition, we are using sequences of genes known from mutational studies to be involved in metal reduction (mirAB) as hybridization probes to search for homologues in additional Shewanella species and other metal-reducing bacteria.
164. Identification, Isolation, and Genome Amplification of Abundant Non-Cultured Bacteria from Novel Phylogenetic Kingdoms in Two Extreme Surface Environments
Cheryl R. Kuske, Susan M. Barns,
John D. Dunbar, Jody A. Davis, and Greg Fisher
Microbial genome sequencing projects have produced a wealth of information on microbial genetics, biochemistry, and evolution with important medical, environmental, agricultural and industrial applications, but have focused primarily on species we can easily culture. Cultured bacteria are only a small fraction of the total bacterial diversity present in the environment. Non-cultured organisms of considerable genetic and biochemical diversity are present in arid and extreme surface environments. Microbial processes in these environments are of critical importance to the biosphere and the non-cultured bacteria residing there are a valuable resource for novel genomic information.
We have initiated a project to (1) identify novel bacterial kingdoms in 16S ribosomal RNA gene (rDNA) libraries from two extreme surface environments to expand our current understanding of the scope of environmental bacterial diversity, (2) determine the abundance and activity of these novel organisms in the environment by using rRNA-targeted fluorescent probes, and (3) collect cells of novel phylogenetic groups that are abundant and active in the environment by flow cytometry and isolate or amplify DNA from the pools of bacterial cells.
Our preliminary results demonstrate the extensive diversity of bacteria present in two extreme terrestrial surface environments. Sequence analysis of 60 clones from two 16S rDNA libraries indicated that bacterial diversity in these two environments is extensive. RFLP fingerprint analysis of 800 clones in the two libraries demonstrated that diversity in the remaining clones was also great and that so far only 5% of them have fingerprints similar to those clones already sequenced. In fact, almost every clone produced a unique pattern, and there was very little overlap in patterns between the two environments. Preliminary RFLP and 16S rDNA sequence analysis indicate that most of the bacterial species represented in the clone libraries fall outside the known, previously-described bacterial taxa. Thus these libraries are a rich resource for identifying and isolating novel bacterial species with novel genes. We are developing fluorescently tagged oligonucleotide probes for detection and collection of some of these bacterial groups from environmental samples.
The pooled DNA of isolated, non-cultured bacteria will be a valuable resource of genetic material for comparative analyses of conserved and novel gene families, and for targeted genome sequencing. Identification and sequence analysis of genes from abundant bacterial species will greatly enhance our understanding of their functional roles in the environment and will significantly expand the set of unique genes and proteins available for DOE missions, as well as for medical, industrial and agricultural applications.
165. WIT System: Advantages of Parallel Analysis of Multiple Genomes
Ross Overbeek, Mark D'Souza, Gordon Pusch,
Natalia Maltsev, and Evgeni Selkov
The WIT system (http://wit.mcs.anl.gov/WIT2/wit.html) was designed and implemented to support genetic sequence and comparative analysis of sequenced genomes and metabolic reconstructions from the sequence data. It now contains data from 34 distinct genomes, although a few of the genomes are quite incomplete. It provides access to thoroughly annotated genomes within a framework of metabolic reconstructions, connected to the sequence data; data on regulatory patterns, protein alignments and phylogenetic trees; as well as data on gene clusters and functional domains. We believe that the parallel analysis of a large number of phylogenetically diverse genomes simultaneously can add a great deal to our understanding of the higher level functional subsystems and major physiological designs. We recently developed a method for using conserved clusters of genes on the chromosome from numerous genomes to predict functional coupling between genes in those genomes1. The results obtained by applying this method to analysis of 34 genomes in WIT collection were very encouraging. We were able to predict major portions of most of the pathways of central metabolism (e.g. glycolysis, purine, pyrimidine biosynthesis, signal transduction, transmembrane transport pathways, etc). Our results agree well with the functional connections between the genes previously described in the literature. We believe that the precision of prediction and the amount of accessible functional coupling increases dramatically as more genomes are included in the analysis. As the number of genomes increases, this class of data may well become one of the significant resources in the effort to establish the function of the hypothetical proteins, better understanding of the functions of the paralogous genes and reconstruction of the functional connections in the higher level functional subsystems.
1Ross Overbeek, Michael Fonstein, Mark D'Souza, Gordon D. Pusch and Natalia Maltsev. Use of Contiguity on the Chromosome to Predict Functional Coupling. (June, 1998) In Silico Biol. 1, 0009
166. Microbial Protein and Regulatory Function Analysis and Database Program
Temple F. Smith
We will have completed the first of three planned years on this project in December 1998. We have already made significant progress. Our first goal was to construct two preliminary profile databases. The first has been generated as part of the functional analyses of the various bacterial and archaeal genes (ORFs) that showed sequence similarity to probable Yeast mitochondria genes. We have generated profile-defining set sequences from a broad set of functional families. We have carefully studied the set of S. cerevisiae mitochondrial proteins and their homologs with the completed genomes and used them to create 367 profiles in which we have confidence that cover a broad set of biological functions. We have created 855 profiles from the Pfam protein family database by reviewing protein sequences in SWISS-PROT using a set of Hidden Markov Model-derived similarity families. There has been an effort to automate the generation of profile-defining sets from the blast-defined similar ORF families from the complete genomes and an initial set of profiles has been derived. This has produced a set of 807 profiles. Current efforts are centered on creating a method for automatically determining a set of disjoint profiles, that is, a set of profiles that are not redundant. We have also begun to investigate, in collaboration with Julio Collado-Vides (CIFN, Mexico), the potential of coordinate regulation among genes that are in neighbors in various biochemical pathways. Here we began with sets of genes in E. coli or some other bacteria or archaea that are organized in operons. Next, each of the operon sets are being examined in Yeast and C. elegans for shared regulatory words. The initial work here led to the identification of two different types of eukaryotic operon equivalent organizations in Yeast, and led to our recent publication (Zhang and Smith, Microb. & Comp. Genomics, 1998). As part of our microbial genome comparisons, we developed a set of integral functions to examine nucleotide base composition along the entire length of the genome. These "excess plots" describe the abundance of a nucleotide property. Instead of examining base composition over a fixed window, excess plot values are calculated cumulatively at each position in the genome , adding one if the next nucleotide shares the property of interest, and subtracting one for each nucleotide with the converse property. These "excess plots" describe the abundance of a nucleotide property over its opposite property. The minima of the Purine Excess plots correlate with the origins of replication for seven bacterial genomes (Escherichia coli, Bacillus subtilis, Mycoplasma pneumoniae, Mycoplasma genitalium, Helicobacter pylori, Haemophilus influenzae, and Synechocystis PCC6803), while the maxima of the Purine Excess plots track with the three known replication termini (from E. coli, B. subtilis and H. influenzae). Keto Excess minima and maxima track the same replicative features in four of the nine bacterial genomes available at the time of study. Additionally, there is a strong correlation between purine excess and coding strand excess, evidenced most remarkably by E. coli and Methanococcus jannaschii, an archaebacterium. It is an ongoing effort to track and analyze excess plots for each new microbial genome, as well as the multiple chromosomes of the available eukaryotic genomes.
Freeman, J.M., Plasterer, T.N., Smith, T.F. and Mohr, S.C. (1998). Patterns of genome organization in bacteria. Science 279, 1827.
Zhang, Xiaolin and Smith, Temple F. (1998). Yeast "operons". Microbial and Comparative Genomics 3(2), 133-140.
167. Annotation of Microbial Genomes
Frank Larimer, Richard Mural, Morey
Parang, Manesh Shah, Victor Olman, Inna Vokler, Jay Snoddy, and Edward
Because of their completeness, sequenced microbial genomes present a number of challenges and opportunities not yet fully addressed by genomics. Conventional annotation is inherently single gene-protein centered, yet the operon and regulon organization of microbial genomes immediately accentuates the incompleteness of this simple gene-protein model. Additionally, few attempts have been made to represent regulatory features. Complete genomes require that regulatory and coding elements as well as global and local structural detail be addressed. Although less than a third of the major bacterial taxa have been sampled, a lack of comprehensive tools for representing evolutionary relationships and the richness of microbial diversity is already evident. Finally, the rapid proliferation of completed genomes emphasizes the need for regular updates to annotation.
We are developing microbial annotation systems to address these needs within the context of the Genome Channel and the Genome Annotation Consortium. In cooperation with The Institute for Genomic Research, we currently have views of the various complete microbial genomes sequenced by TIGR in the Genome Channel. Other complete genomes will be added shortly, and views of genomes in progress will be developed. Among the features being implemented are the following:
168. Insights into Evolution from the Thermotoga maritima Genome
K.E. Nelson, R.A. Clayton, O. White,
J.C. Venter, and C.M. Fraser
Thermotoga maritima is the most extreme thermophilic organotrophic bacterium known, and one of the earliest branching Eubacteria. This obligate anaerobe is capable of utilizing various carbohydrates, including glucose, maltose, starch, cellulose and xylan as energy sources. In an attempt to further understand T. maritima, a whole genome random shotgun sequencing project was initiated at The Institute for Genomic Research (TIGR). The 1,860,725 bp T.maritima genome contains 1872 predicted coding regions, 54% (1005) of which have functional assignments, and 46% (867) of which are of unknown function. Of the sequenced Eubacteria, T. maritima has the highest percentage (24%) of genes that are most similar to archaeal genes. Eighty-one of these genes are clustered in regions of the genome that range in size from 4 - 20 kb. Five of these regions have a composition substantially different from the rest of the genome, suggesting that lateral gene transfer has occurred between the thermophilic Archaea and Eubacteria. In addition to repeat structures which can be identified only in thermophiles, there are 108 genes on the T. maritima genome that have orthologues only in the genomes of other thermophilic Eubacteria and Archaea. Along with a range of pathways for the degradation of both simple and complex carbohydrates, the T. maritima genome is revealing genes whose thermostable products may be useful for industrial processes. The genome sequence is also revealing similarities between the thermophilic Archaea and Eubacteria, and allowing us to address existing theories on evolution. The findings from an analysis of the complete genome sequence will be presented.