|Microbial Genome Project Section
DOE Human Genome Program Contractor-Grantee Workshop
166. Microbial Protein and Regulatory Function Analysis and Database Program
Temple F. Smith
We will have completed the first of three planned years on this project in December 1998. We have already made significant progress. Our first goal was to construct two preliminary profile databases. The first has been generated as part of the functional analyses of the various bacterial and archaeal genes (ORFs) that showed sequence similarity to probable Yeast mitochondria genes. We have generated profile-defining set sequences from a broad set of functional families. We have carefully studied the set of S. cerevisiae mitochondrial proteins and their homologs with the completed genomes and used them to create 367 profiles in which we have confidence that cover a broad set of biological functions. We have created 855 profiles from the Pfam protein family database by reviewing protein sequences in SWISS-PROT using a set of Hidden Markov Model-derived similarity families. There has been an effort to automate the generation of profile-defining sets from the blast-defined similar ORF families from the complete genomes and an initial set of profiles has been derived. This has produced a set of 807 profiles. Current efforts are centered on creating a method for automatically determining a set of disjoint profiles, that is, a set of profiles that are not redundant. We have also begun to investigate, in collaboration with Julio Collado-Vides (CIFN, Mexico), the potential of coordinate regulation among genes that are in neighbors in various biochemical pathways. Here we began with sets of genes in E. coli or some other bacteria or archaea that are organized in operons. Next, each of the operon sets are being examined in Yeast and C. elegans for shared regulatory words. The initial work here led to the identification of two different types of eukaryotic operon equivalent organizations in Yeast, and led to our recent publication (Zhang and Smith, Microb. & Comp. Genomics, 1998). As part of our microbial genome comparisons, we developed a set of integral functions to examine nucleotide base composition along the entire length of the genome. These "excess plots" describe the abundance of a nucleotide property. Instead of examining base composition over a fixed window, excess plot values are calculated cumulatively at each position in the genome , adding one if the next nucleotide shares the property of interest, and subtracting one for each nucleotide with the converse property. These "excess plots" describe the abundance of a nucleotide property over its opposite property. The minima of the Purine Excess plots correlate with the origins of replication for seven bacterial genomes (Escherichia coli, Bacillus subtilis, Mycoplasma pneumoniae, Mycoplasma genitalium, Helicobacter pylori, Haemophilus influenzae, and Synechocystis PCC6803), while the maxima of the Purine Excess plots track with the three known replication termini (from E. coli, B. subtilis and H. influenzae). Keto Excess minima and maxima track the same replicative features in four of the nine bacterial genomes available at the time of study. Additionally, there is a strong correlation between purine excess and coding strand excess, evidenced most remarkably by E. coli and Methanococcus jannaschii, an archaebacterium. It is an ongoing effort to track and analyze excess plots for each new microbial genome, as well as the multiple chromosomes of the available eukaryotic genomes.
Freeman, J.M., Plasterer, T.N., Smith, T.F. and Mohr, S.C. (1998). Patterns of genome organization in bacteria. Science 279, 1827.
Zhang, Xiaolin and Smith, Temple F. (1998). Yeast "operons". Microbial and Comparative Genomics 3(2), 133-140.
|Author Index||Sequencing Technologies||Microbial Genome Program|
|Search||Mapping||Ethical, Legal, & Social Issues|