DOE Human Genome Program Contractor-Grantee
71. Automatic Discovery of Sub-Molecular Sequence Domains in Multi-Aligned Sequences: A Dynamic Programming Algorithm for Multiple Alignment Segmentation
Eric Poe Xing, Ilya Muchnik1, Denise Wolf, Inna Dubchak, Casimir Kulikowski1, Manfred Zorn, and Sylvia Spengler
Center for Bioinformatics and Computational Genomics, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 and 1DIMACS, Rutgers University, Piscataway, NJ 08855
Automatic identification of sub-structures in multiple sequence alignment is of great importance for effective and objective structural/functional domain annotation, phylogenetic treeing and many other types of molecular analyses. We present a segmentation algorithm that optimally partitions a given multi-alignment into a set of potentially biologically sensible segments based on the statistical profile of sequence compositions of the multi-alignment, such as gap frequency and character heterogeneity, through dynamic programming and progressive optimization. Using this algorithm, a large multi-alignment of eukaryotic 16S rRNA was analyzed. Three types of sequence patterns: shared conserved domain; shared variable motif; and rare signature sequence, were identified automatically in a very short time compared to manual annotation, and the result was consistent with the patterns identified through independent phylogenetic approaches. This algorithm potentially facilitates the automation of sequence-based sub-molecular structural and evolutionary analyses through statistical modeling and high performance computation.
|The online presentation of this publication is a special feature of the Human Genome Project Information Web site.|