Bioinformatics Section 

DOE Human Genome Program Contractor-Grantee Workshop VIII
February 27-March 2, 2000  Santa Fe, NM


Home
 
PDF

Author Index
Sequencing
Table of Contents
Abstracts   
Instrumentation
Table of Contents
Abstracts
Mapping 
Table of Contents
Abstracts
Bioinformatics
Table of Contents
Abstracts
Function and cDNA Resources
Table of Contents
Abstracts

Microbial Genome Program
Table of Contents
Abstracts
Ethical, Legal, and Social Issues
Table of Contents
Abstracts
Infrastructure
Table of Contents
Abstracts

Ordering Information

Abstracts from
Past Meetings

71. Automatic Discovery of Sub-Molecular Sequence Domains in Multi-Aligned Sequences: A Dynamic Programming Algorithm for Multiple Alignment Segmentation

Eric Poe Xing, Ilya Muchnik1, Denise Wolf, Inna Dubchak, Casimir Kulikowski1, Manfred Zorn, and Sylvia Spengler

Center for Bioinformatics and Computational Genomics, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 and 1DIMACS, Rutgers University, Piscataway, NJ 08855

EPXing@lbl.gov

Automatic identification of sub-structures in multiple sequence alignment is of great importance for effective and objective structural/functional domain annotation, phylogenetic treeing and many other types of molecular analyses. We present a segmentation algorithm that optimally partitions a given multi-alignment into a set of potentially biologically sensible segments based on the statistical profile of sequence compositions of the multi-alignment, such as gap frequency and character heterogeneity, through dynamic programming and progressive optimization. Using this algorithm, a large multi-alignment of eukaryotic 16S rRNA was analyzed. Three types of sequence patterns: shared conserved domain; shared variable motif; and rare signature sequence, were identified automatically in a very short time compared to manual annotation, and the result was consistent with the patterns identified through independent phylogenetic approaches. This algorithm potentially facilitates the automation of sequence-based sub-molecular structural and evolutionary analyses through statistical modeling and high performance computation.

 


The online presentation of this publication is a special feature of the Human Genome Project Information Web site.