![]() |
|
| Archive Edition | |
|
Sponsored
by the U.S. Department of
Energy Human Genome Program
|
Santa Fe, New Mexico, November 13-17, 1994
|
Introduction to the Workshop
The electronic form of this document may be cited in the following style: Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected. |
A Syntactic Pattern Recognition System for DNA SequencesDavid Searls Formal language theory views languages as sets of strings over some alphabet, and specifies potentially infinite languages with concise sets of rules called grammars. Grammars are an exceptionally well-studied methodology, familiar to all computer scientists, for the description of complex, higher-order structures embodied in strings of symbols. Moreover, they can be given as input to general-purpose programs called parsers capable of determining whether a given string satisfies the rules of the grammar. Parser technology is also extensively developed, and has been applied as well to the problem of searching for complex patterns specified by grammars in large amounts of data, in a technique known as syntactic pattern recognition. We have studied DNA sequences from the perspectives of both formal language theory and practical pattern recognition tasks using linguistic tools. On the formal side we have presented a number of results concerning the mathematical linguistic "complexity" of the language of DNA, e.g. it's position on the Chomsky hierarchy of languages, and the relationship between syntactic structure and secondary structure. We have also defined and characterized a novel grammar formalism, called String Variable Grammar, that is particularly well-suited to the representational needs of DNA. The practical side entails the development and use of a syntactic pattern recognition system for DNA sequences, called GenLang, that takes advantage of structural and/or hierarchical aspects of a domain by using rule-based methods to describe and discriminate such structures. The GenLang system has been used successfully to specify and search for tRNA genes, group I introns, and most recently, protein-encoding genes, achieving results comparable to other, procedural systems. This work was funded by the DOE Genome Program (DE-FG02-92ER61371). [1] Searls, D.B. (1988) "Representing Genetic Information with Formal Grammars" Proceedings of the Seventh National Conference of the American Association for Artificial Intelligence, AAAI/Morgan Kaufman, pp. 386-391.
|
Send the url of this page to a friend
To read pdf files, download the free Acrobat Reader software.
Last modified: Wednesday, October 29, 2003
Home * Contacts * Disclaimer
Base URL: www.ornl.gov/hgmis
Site sponsored by the U.S. Department of Energy
Office of Science, Office
of Biological and Environmental Research, Human
Genome Program