Bioinformatics Section 

DOE Human Genome Program Contractor-Grantee Workshop VIII
February 27-March 2, 2000  Santa Fe, NM


Home
Author Index
Sequencing
Table of Contents
Abstracts   
Instrumentation
Table of Contents
Abstracts
Mapping 
Table of Contents
Abstracts
Bioinformatics
Table of Contents
Abstracts
Function and cDNA Resources
Table of Contents
Abstracts

Microbial Genome Program
Table of Contents
Abstracts
Ethical, Legal, and Social Issues
Table of Contents
Abstracts
Infrastructure
Table of Contents
Abstracts

Ordering Information

Abstracts from
Past Meetings

93. Genome-Scale Protein Structure Prediction in Prochlorococcus europae Genome

Ying Xu, Dong Xu, Oakley H. Crawford, J. Ralph Einstein, and Ed Uberbacher

Computational Biosciences Section, Life Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6480

xyn@ornl.gov

The goal of this pilot project is to assign the maximum amount of structural information to proteins, computationally identified from genes, of the Prochlorococcus europae genome, using a combination of a number of existing methods. Proteins are first classified into four categories: (1) proteins having high level (> 40%) of sequence similarity with their homologs in PDB, as identified by BLAST searches; (2) proteins having medium level (25-40%) of sequence similarities with their homologs in PDB, as detected by PSI-BLAST and (super-)family-specific profiles like HMM models; (3) proteins having low level (< 25%) of sequence similarity with their homologs in PDB, as detected by threading methods; and (4) proteins having no homologs in PDB, as determined by threading and statistical analysis. For each protein of the first class, our prediction system applies MODELLER and SWISS-MODEL to generate a few all-atom structure models. Structure models are generated similarly for proteins of the second class after some refinement on the BLAST-generated alignment based on information extracted from HMM models, active site/motif search results, residue-residue contact patterns, etc. The initial alignments of proteins of the third class are generated by threading methods, including our own program PROSPECT, and refinements are done in a similar fashion. Loop regions are first modeled using mini-threading methods; all-atom models are then generated using MODELLER, SWISS-MODEL, and CNS, based on the threading alignments and modeled loop regions. A combined method of threading and statistical analysis is used to determine if a protein has a new structural fold. Instead of attempting to generate full 3D structures for proteins of class 4, our prediction system searches for possible active sites and predicts structural motifs using the local threading option of PROSPECT. For each prediction, the system assigns a confidence value of the prediction based on our performance analysis on a benchmark data set. Preliminary prediction results will be presented in this presentation.

(Research sponsored by the Office of Biological and Environmental Research, USDOE under contract number DE-AC05-96OR22464 with Lockheed Martin Energy Research Corp.)

 


The online presentation of this publication is a special feature of the Human Genome Project Information Web site.