![]() |
Genome Informatics Section
DOE Human Genome Program Contractor-Grantee Workshop
VII
|
|
109. Multiple Sequence Alignment with Confidence Estimates David J. States
Multiple sequence alignment (MSA) is the basis for many aspects of molecular sequence analysis including phylogenetics, motif detection and molecular modeling. Because the space of possible multiple sequence alignments is very large and the information accessible through sequence data is limited, there are often regions of a multiple sequence alignment that are not well determined. Here we develop a theory for assessing the confidence of multiple sequence alignment, describes software that implements this algorithm, and discusses the application of these methods. A hierarchical approach to MSA is used in which each constituent sequence is related to the full alignment as a leaf in a tree of nearest neighbor relationships. The algorithm uses a progressive strategy for building the multiple alignment. Hidden Markov Models (HMM) are used to describe each sequence or collection of sequences. At each phase in the alignment calculation, all current models are compared with each other using a dynamic programming calculation to calculate the maximum scoring local alignment. A new HMM is derived from the pair of models with the highest alignment score, and this new model replaces both of the previous models. The iteration is repeated until only a single HMM remains. A site specific confidence estimate, C, for pairwise alignemnts is calculated by comparing the likelihood for the optimal alignment passing through a pair of residues with the sum of the likelihoods for all alternative pairings of either the query or target residue. where 0 < C < 1 . The overall confidence
for a site in the multiple sequence alignment is calculated as the product
of the confidence in the all of the pairwise alignments making up the full
MSA.
The algorithm provides an efficient way to build HMMs for large families of unaligned sequences. A web site provide access to this tool is available at http://www.ibc.wustl.edu/service/msa. |
| Home | Sequencing | Functional Genomics |
| Author Index | Sequencing Technologies | Microbial Genome Program |
| Search | Mapping | Ethical, Legal, & Social Issues |
| Informatics | Infrastructure |