![]() |
|
| Archive Edition | |
|
Sponsored
by the U.S. Department of
Energy Human Genome Program
|
Santa Fe, New Mexico, November 13-17, 1994
|
Introduction to the Workshop
The electronic form of this document may be cited in the following style: Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected. |
Prediction of Coding Regions in Genomic DNA: Optimal and Suboptimal ParsesEric E. Snyder and Gary D. Stormo We have developed an approach for predicting coding regions in genomic DNA that utilizes multiple types of evidence, combines those into a single scoring function and then returns both optimal and ranked suboptimal solutions using that scoring function. The current version of the program predicts four classes of sequence: introns and three types of exons, first, last and internal. It uses a variety of statistical tests for these different classes, including those for the signals that define their ends and for baises in their contained sequences. A neural network is used to weight the different types of statistical tests to optimize performance, which we find to be as good or better than other published methods when tested on new examples. However, we find one of the most important features of this system is the ability to examine multiple solutions which is provided by the dynamic programming approach. These multiple, ranked solutions often provide indications of which portions of the predictions are most reliable and in cases where the highest scoring prediction is not correct it can often be found in a high ranking suboptimal solution. Furthermore alternative splicing patterns can often be found among the high ranking suboptimal solutions. We have performed tests of the robustness of the method when there are sequencing errors in the data, and shown that the system can be trained to optimize performance for data with specified error rates. We are now exploring methods for reliably predicting other classes of sequence regions, especially promoters. These include approaches based on minimal length encoding algorithms and on Sequence Landscape methods. Recent results from these approaches will be described.
|
Send the url of this page to a friend
Last modified: Wednesday, October 29, 2003
Home * Contacts * Disclaimer
Base URL: www.ornl.gov/hgmis
Site sponsored by the U.S. Department of Energy
Office of Science, Office
of Biological and Environmental Research, Human
Genome Program