DOE Human Genome Program Contractor-Grantee
93. Genome-Scale Protein Structure Prediction in Prochlorococcus europae Genome
Ying Xu, Dong Xu, Oakley H. Crawford, J. Ralph Einstein, and Ed Uberbacher
Computational Biosciences Section, Life Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6480
The goal of this pilot project is to assign the maximum amount of structural information to proteins, computationally identified from genes, of the Prochlorococcus europae genome, using a combination of a number of existing methods. Proteins are first classified into four categories: (1) proteins having high level (> 40%) of sequence similarity with their homologs in PDB, as identified by BLAST searches; (2) proteins having medium level (25-40%) of sequence similarities with their homologs in PDB, as detected by PSI-BLAST and (super-)family-specific profiles like HMM models; (3) proteins having low level (< 25%) of sequence similarity with their homologs in PDB, as detected by threading methods; and (4) proteins having no homologs in PDB, as determined by threading and statistical analysis. For each protein of the first class, our prediction system applies MODELLER and SWISS-MODEL to generate a few all-atom structure models. Structure models are generated similarly for proteins of the second class after some refinement on the BLAST-generated alignment based on information extracted from HMM models, active site/motif search results, residue-residue contact patterns, etc. The initial alignments of proteins of the third class are generated by threading methods, including our own program PROSPECT, and refinements are done in a similar fashion. Loop regions are first modeled using mini-threading methods; all-atom models are then generated using MODELLER, SWISS-MODEL, and CNS, based on the threading alignments and modeled loop regions. A combined method of threading and statistical analysis is used to determine if a protein has a new structural fold. Instead of attempting to generate full 3D structures for proteins of class 4, our prediction system searches for possible active sites and predicts structural motifs using the local threading option of PROSPECT. For each prediction, the system assigns a confidence value of the prediction based on our performance analysis on a benchmark data set. Preliminary prediction results will be presented in this presentation.
(Research sponsored by the Office of Biological and Environmental Research, USDOE under contract number DE-AC05-96OR22464 with Lockheed Martin Energy Research Corp.)
|The online presentation of this publication is a special feature of the Human Genome Project Information Web site.|