DOE Human Genome Program Contractor-Grantee
72. Classification of Multi-Aligned Sequence Using Monotone Linkage Clustering and Alignment Segmentation
Eric Poe Xing, Ilya Muchnik1, Manfred Zorn, and Sylvia Spengler
Center for Bioinformatics and Computational Genomics, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 and 1DIMACS, Rutgers University, Piscataway, NJ 08855
Optimal clustering of a set of sequence based on arbitrary set function is often of exponential complexity. In this paper, a low order polynomial procedure, which is based on the quasi-concavity of a special type of objective functions, was developed to cluster the multi-aligned sequences based on each of the segments resulted from the aforementioned segmentation process. It clusters sequences according to their degree of similarity to a pre-specified reference pattern (i.e. a consensus sequence or a particular organismal sequence of choice). A combination of such clustering from multiple segments results in a fairly fine-grained classification of all the sequences in the alignment, with a general pattern that is reminiscent of the branching order in a corresponding phylogenetic tree, but with additional information regarding the assumption of modular evolution. This algorithm can be applied to a broad spectrum of molecular sequence analysis purposes such as phylogenetic subtree construction or recognition, tree updating and labeling, and can serve as a framework to organize sequence data in an efficient and easily searchable manner.
|The online presentation of this publication is a special feature of the Human Genome Project Information Web site.|