Genome Informatics Section 

DOE Human Genome Program Contractor-Grantee Workshop VII 
January 12-16, 1999  Oakland, CA


96. Hidden Markov Models in Biosequence Analysis: Recent Results and New Methods 

Christian Barrett, Mark Diekhans, Richard Hughey, Tommi Jaakkola, Kevin Karplus, David Kulp, Stephen Winters-Hilt, and David Haussler 
Computer Science Department, University of California, Santa Cruz, CA 95064 haussler@cse.ucsc.edu 

Currently there is an acute need for effective methods for locating genes in DNA sequences, along with their splice sites and regulatory binding sites, and for classifying new proteins by their predicted structure or function. Hidden Markov Models (HMMs) have proven to be useful tools for these tasks. We have recently extended the HMM-based genefinding system Genie so that it can simultaneously incorporate protein homology and EST information to improve gene finding. We have also built a new library of HMMs for protein families and tested our methods against other methods for the detection of remote homologies between proteins in a large scale experiment conducted at the Laboratory for Molecular Biology in Cambridge. Results showed the method to be superior to other methods, including PSI-BLAST, the nearest competitor. Finally, we have developed a new method of biosequence classification called the Fisher kernel method. Here an HMM (or any parametric generative model for a family of biosequences) is used to embed the sequences into a linear space with a natural inner product defined using the Fisher information matrix. One can then employ a variety of classification methods to discriminate members of the family from nonmembers, for example, support vector machines. We present experiments for the protein superfamily classification problem that show the Fisher kernel method is superior to existing HMM approaches, and to simpler methods such as BLAST. In particular, the method is better at finding remote homologs in nearly all the 33 protein families we tested, including G proteins, retroviral proteases, interferons, and many others.  


 
Home Sequencing Functional Genomics
Author Index Sequencing Technologies Microbial Genome Program
Search Mapping Ethical, Legal, & Social Issues
PDF Informatics Infrastructure