Bioinformatics Section 

DOE Human Genome Program Contractor-Grantee Workshop VIII
February 27-March 2, 2000  Santa Fe, NM


Home
Author Index
Sequencing
Table of Contents
Abstracts   
Instrumentation
Table of Contents
Abstracts
Mapping 
Table of Contents
Abstracts
Bioinformatics
Table of Contents
Abstracts
Function and cDNA Resources
Table of Contents
Abstracts

Microbial Genome Program
Table of Contents
Abstracts
Ethical, Legal, and Social Issues
Table of Contents
Abstracts
Infrastructure
Table of Contents
Abstracts

Ordering Information

Abstracts from
Past Meetings

66. Finding Remote Protein Homologs

Kevin Karplus

University of California, Santa Cruz, Baskin School of Engineering, Santa Cruz, CA 95064

karplus@cse.ucsc.edu

Since Spring 1996, the bioinformatics group at UCSC has been working on ways to find and align homologs of proteins, even when the sequences of the proteins are quite diverged. Our main approach has been to use hidden Markov models with Dirichlet mixture regularizers for both the search and the alignment. The method uses only sequence information, not structural information, and so can be applied even to proteins whose structure is still unknown.

The main tests for the method are fold-recognition and alignment tests--searches and alignments are made for proteins whose structure is known (but not used in the search or alignment), and the results are compared with the results of structural alignment. In a test against other sequence-based search and alignment methods (including PSI-BLAST and ISS), our SAM-T98 method found more true homologs (based on SCOP) than other methods at any level of accepted errors.

A common use of remote homologs is to predict the structure of the protein. We have participated in both the CASP2 and CASP3 experiments for blind prediction of protein structure. In both, we were in the top six groups (invited to the special issue of Proteins) for fold recognition and alignment. In CASP3, our alignments of the comparative-homology targets were consistently among the best (approximately top 3), even though we made no use of structural information.

For CASP3, we also tested a secondary-structure predictor using a neural net and the SAM-T98 multiple alignments used by our fold-recognition method. This predictor was the second best of the 31 groups participating, and we have since improved it.

We have installed an automatic server on the Web to take a sequence (or seed alignment) and produce the multiple alignment of similar sequences in NCBI's non-redundant protein database, search results for proteins with structures in PDB, and secondary-structure predictions.

For more information about our projects, see the Web.


The online presentation of this publication is a special feature of the Human Genome Project Information Web site.